Harindu 's Personal Blog : Genome Database Systems- Important areas in Data Management

Hey guys, today I will be discussing a very important aspect of Genome Database systems. It is Data Management in genome database systems.

Important areas in Data Management in Genome Database Systems

Nonstandard and unstructured data.

Most of the genome data are nonstandard and unstructured. It is not clear that every position in a DNA sequence should be treated as a data object. There are structural data like proteins, but they need a 3D representation. Techniques from GIS (geographic information systems) and CAD (computer aided design) as well as from geometric modeling need to be applied for efficient indexing and querying. Scientists are currently using tools like BLAST or PSBLAST to do pattern searches and this capability needs to be integrated into the DBMS.

Complex query processing.

As discussed in the characteristics of genome databases, since the similarity of sequences, graphs and 3-D shapes it has become really hard to implement queries. Relational DBMS and Object DBMS are not capable of processing these type of queries. Therefore DBMS developers have implemented path oriented queries and specialized libraries to cater this requirement.

Data interpretation and Meta data management.

Decent mechanisms should be implemented in the database system manage the meta data, because in a database system like this is enough meta data should be provided to the scientists for the interpretation purposes and they need to be maintained in a virtuous manner. In order to do that several techniques have been implemented such as use of Annotations and Ontology.

Data integration across related databases.

Various genome databases interrelated with each other and there should be a proper management tool to handle these kinds of cross related database links. Currently no uniform interfaces or consolidation of data has been done so that information can be accessed in an integrated fashion in any given context or by any particular classification.

Need for a set of uniform data management solutions.

There is a tremendous need to have this kind of uniform data management solution because of “typical problems in databases of heterogeneous data integration - multiple models, multiple formats, different underlying files and database systems, and a large amount of context-sensitive semantic content.”

This is a brief explanation on the key areas of data management in genome database systems. In the next post I hope to give your guys a mush more deeper analysis in to genome database systems. Till then Good bye folks. :D

Harindu 's Personal Blog

Saturday, May 3, 2014

Genome Database Systems- Important areas in Data Management