Show simple item record

2007-09-26Diskussionspapier DOI: 10.18452/4079
Conditional Complexity of Compression for Authorship Attribution
dc.contributor.authorMalyutov, Mikhail B.
dc.contributor.authorWickramasinghe, Chammi I.
dc.contributor.authorLi, Sufeng
dc.date.accessioned2017-06-15T23:32:51Z
dc.date.available2017-06-15T23:32:51Z
dc.date.created2007-12-12
dc.date.issued2007-09-26
dc.identifier.issn1860-5664
dc.identifier.urihttp://edoc.hu-berlin.de/18452/4731
dc.description.abstractConditional Complexity of Compression for Authorship Attribution Abstract: We introduce new stylometry tools based on the sliced conditional compression complexity of literary texts which are inspired by the nearly optimal application of the incomputable Kolmogorov conditional complexity (and presumably approximates it). Whereas other stylometry tools can occasionally be very close for different authors, our statistic is apparently strictly minimal for the true author, if the query and training texts are sufficiently large, compressor is sufficiently good and sampling bias is avoided (as in the poll samplings). We tune it and test its performance on attributing the Federalist papers (Madison vs. Hamilton). Our results confirm the previous attribution of Federalist papers by Mosteller and Wallace (1964) to Madison using the Naive Bayes classifier and the same attribution based on alternative classifiers such as SVM, and the second order Markov model of language. Then we apply our method for studying the attribution of the early poems from the Shakespeare Canon and the continuation of Marlowe’s poem ‘Hero and Leander’ ascribed to G. Chapman.eng
dc.language.isoeng
dc.publisherHumboldt-Universität zu Berlin, Wirtschaftswissenschaftliche Fakultät
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/
dc.subjectCompression Complexityeng
dc.subjectAuthorship Attributioneng
dc.subject.ddc330 Wirtschaft
dc.titleConditional Complexity of Compression for Authorship Attribution
dc.typeworkingPaper
dc.identifier.urnurn:nbn:de:kobv:11-10082346
dc.identifier.doihttp://dx.doi.org/10.18452/4079
local.edoc.pages38
local.edoc.type-nameDiskussionspapier
local.edoc.container-typeseries
local.edoc.container-type-nameSchriftenreihe
local.edoc.container-year2007
dc.identifier.zdb2195055-6
bua.series.nameSonderforschungsbereich 649: Ökonomisches Risiko
bua.series.issuenumber2007,57

Show simple item record