Automatic Classification of the Berliner Handreichungen zur Bibliotheks- und Informationswissenschaft
Philosophische Fakultät
Classification systems are one of the most established methods of knowledge organization with many advantages and yet, the collection of the Berliner Handreichungen zur Bibliotheks- und Informationswissenschaft (BHR) is missing a classification scheme. Therefore, an objective of the thesis at hand is to achieve a classification system for the collection and to potentially use Machine Learning (ML) methods for the automatic allocation of the BHR documents to the obtained classification system. The research questions that will be answered, are whether the JITA Classification System of Library and Information Science (JITA) is an appropriate classification system for the BHR and if automatic classification with ML can be applied to allocate the documents of the collection to a classification system without a using BHR data in the training dataset. To evaluate JITA an evaluation checklist was created based on recommendations of the cited literature. Using this checklist, it was concluded that JITA is not suitable as classification system of the BHR. Thus, using the same checklist as a reference, a new classification system was created. No expert evaluations nor user studies were conducted, which is a clear limitation of the thesis at hand. After a suitable classification scheme for the BHR was created, titles and abstracts of documents from different sources were scraped to use them as the training set for the ML experiments. Naïve Bayes, SVM, and Logistic Regression classifiers as well as Deep Learning classifiers, using the FLAIR framework, were tested. None of the obtained models yielded satisfying results, which is why no further experiments classifying the BHR documents were conducted. It was concluded that an automatic classification of the BHR documents is not possible without a BHR training set. Several limitations, especially during the creation of the training set, could have led to the unsatisfactory results which will be discussed in this thesis, which offers a basis for future studies that aim to evaluate classification schemes or for further Text Classification experiments.