Show simple item record

2017-03-24Konferenzveröffentlichung DOI: 10.18452/1447
news-please
dc.contributor.authorHamborg, Felix
dc.contributor.authorMeuschke, Norman
dc.contributor.authorBreitinger, Corinna
dc.contributor.authorGipp, Bela
dc.date.accessioned2017-06-15T13:06:41Z
dc.date.available2017-06-15T13:06:41Z
dc.date.created2017-03-24
dc.date.issued2017-03-24
dc.identifier.isbn978-3-86488-117-6
dc.identifier.urihttp://edoc.hu-berlin.de/18452/2099
dc.description.abstractThe amount of news published and read online has increased tremendously in recent years, making news data an interesting resource for many research disciplines, such as the social sciences and linguistics. However, large scale collection of news data is cumbersome due to a lack of generic tools for crawling and extracting such data. We present news-please, a generic, multilanguage, open-source crawler and extractor for news that works out-of-the-box for a large variety of news websites. Our system allows crawling arbitrary news websites and extracting the major elements of news articles on those websites, i.e., title, lead paragraph, main content, publication date, author, and main image. Compared to existing tools, news-please features full website extraction requiring only the root URL.eng
dc.language.isoeng
dc.publisherHumboldt-Universität zu Berlin
dc.subjectnews crawlereng
dc.subjectnews extractoreng
dc.subjectscrapereng
dc.subjectinformation extractioneng
dc.subject.ddc020 Bibliotheks- und Informationswissenschaft
dc.titlenews-please
dc.typeconferenceObject
dc.subtitleA Generic News Crawler and Extractor
dc.identifier.urnurn:nbn:de:kobv:11-100245315
dc.identifier.doihttp://dx.doi.org/10.18452/1447
local.edoc.container-titleEverything Changes, Everything Stays the Same? Understanding Information Spaces. Proceedings of the 15th International Symposium of Information Science (ISI 2017)
local.edoc.container-textErschienen in der Reihe "Schriften zur Informationswissenschaft", Band 70none
local.edoc.pages6
local.edoc.type-nameKonferenzveröffentlichung
local.edoc.institutionPhilosophische Fakultätnone
local.edoc.container-typebook
local.edoc.container-type-nameBuch
local.edoc.container-publisher-nameVerlag Werner Hülsbuschnone
local.edoc.container-publisher-placeGlückstadtnone
local.edoc.container-eventEverything Changes, Everything Stays the Same? Understanding Information Spaces. 15th International Symposium of Information Science (ISI 2017), Berlin, Germany, 13.03.2017 - 15.03.2017
local.edoc.container-periodicalpart-creatorMaria Gäde, Violeta Trkulja, Vivien Petras (Hrsg.)
local.edoc.container-firstpage218
local.edoc.container-lastpage223

Show simple item record