Probabilistic Topic Models in Natural Language Processing
Wirtschaftswissenschaftliche Fakultät
In Machine Learning dienen topic models der Entdeckung abstrakter Strukturen in großen Textsammlungen. Ich präsentiere eine zugeschnittene Auswahl von Konzepten aus den Gebieten Informationstheorie und Statistik, um ein solides Fundament für das Verständnis von topic models zu schaffen. Die präsentierten Konzepte beinhalten Theoreme, sowie Beispiele und Visualisierungen. Ich konzentriere mich auf zwei Modelle im Besonderen: Die Latent Dirichlet Allocation und das Dynamic Topic Model. Beispiele, programmiert in der Programmiersprache Python, veranschaulichen mögliche Anwendungsfälle, unter anderem die Zuordnung inhaltlich ähnlicher Nachrichtenartikel und die Analyse der Themenentwicklung von Nachrichten über die Zeit. Das Ziel dieser Arbeit ist es, den Leser von einem grundlegenden Statistikverständnis, wie es oft im Bachelorstudium erworben wird, zu einem Verständnis des Themenbereiches topic models zu führen. In machine learning, topic models serve to discover abstract structures in large document collections. I present a tailored selection of concepts both from information theory and from statistics to build a solid foundation for understanding topic models. The concepts presented include theorems as well as examples and visualizations. I focus on two models in particular: The Latent Dirichlet Allocation and the Dynamic Topic Model. Applications, built in the Python programming language, demonstrate possible cases of application, such as matching news articles similar in content and exploring the topic evolution of news articles over time. This paper’s objective is to guide the reader from a casual understanding of basic statistical concepts, such as those typically acquired in undergraduate studies, to an understanding of topic models.
Files in this item
Related Items
Show related Items with similar Title, Author, Creator or Subject.
-
2017-03-24KonferenzveröffentlichungVisualising Topics in Document Collections Christoforidis, Anastasia; Heuwing, Ben; Mandl, ThomasThis paper discusses two multivariate visualisations which provide insights into topic model distributions across subcollections of a collection of historical textbooks in the context of a digital humanities project. ...
-
2018-03-22DiskussionspapierFictional expectations and the global media in the Greek debt crisis Daniel, Volker; Neubert, Magnus; Orban, AgnesWe study the role of global media during the Greek debt crisis and relate it to the transmission of events on financial actors' expectations. To identify news coverage about the Greek debt crisis, we apply topic modeling ...
-
2019-04-16ZeitschriftenartikelTopic modeling for analyzing open-ended survey responses Pietsch, Andra-Selina; Lessmann, StefanOpen-ended responses are widely used in market research studies. Processing of such responses requires labour-intensive human coding. This paper focuses on unsupervised topic models and tests their ability to automate the ...