Probabilistic Topic Models in Natural Language Processing
Wirtschaftswissenschaftliche Fakultät
In Machine Learning dienen topic models der Entdeckung abstrakter Strukturen in großen Textsammlungen. Ich präsentiere eine zugeschnittene Auswahl von Konzepten aus den Gebieten Informationstheorie und Statistik, um ein solides Fundament für das Verständnis von topic models zu schaffen. Die präsentierten Konzepte beinhalten Theoreme, sowie Beispiele und Visualisierungen. Ich konzentriere mich auf zwei Modelle im Besonderen: Die Latent Dirichlet Allocation und das Dynamic Topic Model. Beispiele, programmiert in der Programmiersprache Python, veranschaulichen mögliche Anwendungsfälle, unter anderem die Zuordnung inhaltlich ähnlicher Nachrichtenartikel und die Analyse der Themenentwicklung von Nachrichten über die Zeit. Das Ziel dieser Arbeit ist es, den Leser von einem grundlegenden Statistikverständnis, wie es oft im Bachelorstudium erworben wird, zu einem Verständnis des Themenbereiches topic models zu führen. In machine learning, topic models serve to discover abstract structures in large document collections. I present a tailored selection of concepts both from information theory and from statistics to build a solid foundation for understanding topic models. The concepts presented include theorems as well as examples and visualizations. I focus on two models in particular: The Latent Dirichlet Allocation and the Dynamic Topic Model. Applications, built in the Python programming language, demonstrate possible cases of application, such as matching news articles similar in content and exploring the topic evolution of news articles over time. This paper’s objective is to guide the reader from a casual understanding of basic statistical concepts, such as those typically acquired in undergraduate studies, to an understanding of topic models.
Files in this item
Related Items
Show related Items with similar Title, Author, Creator or Subject.
-
2019-10-08ZeitschriftenartikelA paradigm shift in German family policy Gülzau, FabianThis article explores the newspaper discourse surrounding a paradigm shift in social policy. The case at hand, Germany, is a prime example of a welfare state that was particularly resistant to reform. Hence, the rapid ...
-
2021-04DiskussionspapierTurn, Turn, Turn. A Digital History of German Historiography, 1950-2019 Wehrheim, Lino; Jopp, Tobias A.; Spoerer, MarkThe increasing availability of digital text collections and the corresponding establishment of methods for computer-assisted analysis open up completely new perspectives on historical textual sources. In this paper, we use ...
-
2022-06DiskussionspapierA mirror to the world. Taking the German News Magazine Der Spiegel into a Topic Modeling/Sentiment Perspective Wehrheim, LinoThe importance of mass media is reflected, among other things, in the fact that their coverage on certain topics – contrary to findings from communication research – is often seen as a reflection of the topics that are ...