| edoc-Server der Humboldt-Universität zu Berlin |
| Publikationsart: | Workshop- oder Konferenzbeitrag |
| Autor(en): | Heiko Müller; Felix Naumann; Johann-Christoph Freytag |
| Titel: | Data Quality in Genome Databases |
| Erschienen in: |
Eighth International Conference on Information Quality (IQ 2003) 2003 S. 269-284 |
| Veranstaltung: |
8. IQ 2003 MIT Sloan School of Management, Cambridge, MA, USA 07.11.2003 - 09.11.2003 |
| Verlag: |
IQ http://www.iqconference.org/ |
| Erscheinungsort: | Cambridge, MA, USA |
| Erstveröffentlichung: | 01.11.2003 |
| Veröffentlichung auf edoc: | 02.07.2006 |
| Status: |
published peer_reviewed |
| Volltext: | pdf (urn:nbn:de:kobv:11-10065636) |
| URL der Erstveröffentlichung: | http://www.iqconference.org/iciq/iqdownload.aspx?ICIQYear=2003&File=DataQualityinGenomeDatabases.pdf |
| Fachgebiet(e): | Informatik |
| Schlagwörter (eng): | Data Mining, Data Conflicts, Data Cleansing, Molecular Biology, Data Errors |
| Einrichtung: | Humboldt-Universität zu Berlin, Mathematisch-Naturwissenschaftliche Fakultät II |
| Metadatenexport:
|
Endnote Bibtex |
| print on demand:
|
|
| Diese Seite taggen:
|
| Abstract (eng): | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Genome databases store data about molecular biological entities such as genes, proteins, diseases, etc. The main purpose of creating and maintaining such databases in commercial organizations is their importance in the process of drug discovery. Genome data is analyzed and interpreted to gain so-called leads, i.e., promising structures for new drugs. Following a lead through the process of drug development, testing, and finally sev-eral stages of clinical trials is extremely expensive. Thus, an underlying high quality data-base is of utmost importance. Due to the exploratory nature of genome databases, commer-cial and public, they are inaccurate, incomplete, outdated and in an overall poor state. This paper highlights the important challenges of determining and improving data quality for databases storing molecular biological data. We examine the production process for ge-nome data in detail and show that producing incorrect data is intrinsic to the process at the same time highlight common types of data errors. We compare these error classes with ex-isting solutions for data cleansing and come to the conclusion that traditional and proven data cleansing techniques of other application domains do not suffice for the particular needs and problem types of genomic databases. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Zugriffsstatistik:
Bei Formatversionen eines Dokuments, die aus mehreren Dateien bestehen (insbesondere HTML), wird jeweils der monatlich höchste Zugriffswert auf eine der Dateien (Kapitel) des Dokuments angezeigt. Um die detaillierten Zugriffszahlen zu sehen, fahren Sie bitte mit dem Mauszeiger über die einzelnen Balken des Diagramms. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Gesamtzahl der Zugriffe seit May 2011:
|
|
| |||