Zur Kurzanzeige

2017-03-24Konferenzveröffentlichung DOI: 10.18452/1447
news-please
dc.contributor.authorHamborg, Felix
dc.contributor.authorMeuschke, Norman
dc.contributor.authorBreitinger, Corinna
dc.contributor.authorGipp, Bela
dc.contributor.editorGäde, Maria
dc.contributor.editorTrkulja, Violeta
dc.contributor.editorPetras, Vivien
dc.date.accessioned2017-06-15T13:06:41Z
dc.date.available2017-06-15T13:06:41Z
dc.date.created2017-03-24
dc.date.issued2017-03-24none
dc.identifier.otherhttp://edoc.hu-berlin.de/conferences/isi2017/hamborg-felix-218/PDF/hamborg.pdf
dc.identifier.urihttp://edoc.hu-berlin.de/18452/2099
dc.description.abstractThe amount of news published and read online has increased tremendously in recent years, making news data an interesting resource for many research disciplines, such as the social sciences and linguistics. However, large scale collection of news data is cumbersome due to a lack of generic tools for crawling and extracting such data. We present news-please, a generic, multilanguage, open-source crawler and extractor for news that works out-of-thebox for a large variety of news websites. Our system allows crawling arbitrary news websites and extracting the major elements of news articles on those websites, i.e., title, lead paragraph, main content, publication date, author, and main image. Compared to existing tools, news-please features full website extraction requiring only the root URL.eng
dc.language.isoeng
dc.publisherHumboldt-Universität zu Berlin
dc.relation.ispartofseriesEverything Changes, Everything Stays theSame? Understanding Information Spaces. Proceedings of the 15th InternationalSymposium of Information Science (ISI 2017), isi2017, 13.03.2017 - 15.03.2017, Berlin, pp 218-223
dc.subjectnews crawlereng
dc.subjectnews extractoreng
dc.subjectscrapereng
dc.subjectinformation extractioneng
dc.subject.ddc020 Bibliotheks- und Informationswissenschaft
dc.titlenews-please
dc.typeconferenceObject
dc.subtitleA Generic News Crawler and Extractor
dc.identifier.urnurn:nbn:de:kobv:11-100245315
dc.identifier.doihttp://dx.doi.org/10.18452/1447
local.edoc.container-titleEverything Changes, Everything Stays theSame? Understanding Information Spaces. Proceedings of the 15th InternationalSymposium of Information Science (ISI 2017)
local.edoc.container-titleisi2017
local.edoc.pages6
local.edoc.type-nameKonferenzveröffentlichung
local.edoc.container-typeconference
local.edoc.container-type-nameKonferenz
local.edoc.container-event13.03.2017 - 15.03.2017
local.edoc.container-eventBerlin
local.edoc.container-firstpage218
local.edoc.container-lastpage223

Zur Kurzanzeige