Show simple item record

2003-11-01Konferenzveröffentlichung DOI: 10.18452/9205
Data Quality in Genome Databases
dc.contributor.authorMüller, Heiko
dc.contributor.authorNaumann, Felix
dc.contributor.authorFreytag, Johann-Christoph
dc.date.accessioned2017-06-17T00:21:28Z
dc.date.available2017-06-17T00:21:28Z
dc.date.created2006-07-02
dc.date.issued2003-11-01
dc.identifier.otherhttp://www.iqconference.org/iciq/iqdownload.aspx?ICIQYear=2003&File=DataQualityinGenomeDatabases.pdf
dc.identifier.urihttp://edoc.hu-berlin.de/18452/9857
dc.description.abstractGenome databases store data about molecular biological entities such as genes, proteins, diseases, etc. The main purpose of creating and maintaining such databases in commercial organizations is their importance in the process of drug discovery. Genome data is analyzed and interpreted to gain so-called leads, i.e., promising structures for new drugs. Following a lead through the process of drug development, testing, and finally sev-eral stages of clinical trials is extremely expensive. Thus, an underlying high quality data-base is of utmost importance. Due to the exploratory nature of genome databases, commer-cial and public, they are inaccurate, incomplete, outdated and in an overall poor state. This paper highlights the important challenges of determining and improving data quality for databases storing molecular biological data. We examine the production process for ge-nome data in detail and show that producing incorrect data is intrinsic to the process at the same time highlight common types of data errors. We compare these error classes with ex-isting solutions for data cleansing and come to the conclusion that traditional and proven data cleansing techniques of other application domains do not suffice for the particular needs and problem types of genomic databases.eng
dc.language.isoeng
dc.publisherHumboldt-Universität zu Berlin, Mathematisch-Naturwissenschaftliche Fakultät II
dc.subjectData Miningeng
dc.subjectData Conflictseng
dc.subjectData Cleansingeng
dc.subjectMolecular Biologyeng
dc.subjectData Errorseng
dc.subject.ddc004 Informatik
dc.titleData Quality in Genome Databases
dc.typeconferenceObject
dc.identifier.urnurn:nbn:de:kobv:11-10065636
dc.identifier.doihttp://dx.doi.org/10.18452/9205
local.edoc.container-title8. IQ 2003
local.edoc.container-title8. IQ 2003
local.edoc.container-titleEighth International Conference on Information Quality (IQ 2003)
local.edoc.fp-subtypepaper
local.edoc.type-nameKonferenzveröffentlichung
local.edoc.institutionMathematisch-Naturwissenschaftliche Fakultät II
local.edoc.container-typeconference
local.edoc.container-type-nameKonferenz
local.edoc.container-urlhttp://www.iqconference.org/
local.edoc.container-publisher-nameIQ
local.edoc.container-publisher-placeCambridge, MA, USA
local.edoc.container-eventEighth International Conference on Information Quality (IQ 2003), 2003, pp 269-284, 8. IQ 2003, MIT Sloan School of Management, Cambridge, MA, USA, 07.11.2003 - 09.11.2003
local.edoc.container-year2003
local.edoc.container-firstpage269
local.edoc.container-lastpage284
dc.description.versionPeer Reviewed

Show simple item record