Friday, May 2, 2014

Genome database systems - Overview

Hi guys after some time back to blogging. So this time I though of writing about a very demanding and growing area of bioinformatics, genome databases. For the last six month I have been working on genome database systems to write my independent study. In this post I would like to give a small overview to genome database systems and some of the characteristics of genome database systems.

Overview of Genome Database Systems 

The study of genes and proteins has become an extremely important area in the modern day biology and they are better known as genomics and proteomics. In these areas larger number of biological data is being used frequently. Therefore the databases which contain these data play a vital role in fields of biology and medicine. The term genomes refer to the total amount of genetic code present in the cells of an organism. Genomics consists of two component areas. Namely structural genomics and functional genomics. Genome databases store this information and differently from gene databases the genome databases contain both coding and non-coding intergenic sequences. Following are some examples for genome databases. 

  • Saccharomyces genome database
  • Mouse genome database
  • Human genome database
  • European mutant mouse pathology database
  • Mito Map
  • Kyoto Encyclopedia of Genes and Genomes

Characteristics of Genome Database Systems


  • Data are highly complex when compared with most other domains and applications.


Compared to other domains’ data types genome data have the highest possible complexity. This can be explained using the following example. In MITO Map database it stores the human mitochondrial genome. “This single genome is a small, circular piece of DNA encompassing information about 16,569 nucleotide bases; 52 gene encoding messenger RNA, ribosomal RNA, and transfer RNA; 1000 known population variants; over 60 known disease associations.” These types of data should be stored in way that can be processed by the computers and also should have the ability to be handled by biologists as well. At the first relational DBMS and Object Oriented DBMS approaches were taken to model this data but then scientists moved on to their own ways of representing these data. But currently relational DBMS is used for the sake of long time maintenance and ease of curation. 

  • Schemas change at a rapid pace.


     Therefore in order to have improved information flow management in released databases, there should be features to support data object migration and schema evolution.  Most relational and object databases have a fixed schema. Therefore in order to maintain this characteristic, some databases, release new schema releases in every two or three years’ time. e.g.: GenBank

  •  Representations of the same data by different biologists will likely be different.

    Therefore there should be mechanisms to maintain the uniformity of the database. In order to implement this queries which can be interrelate and link different schemas have been used.

  • The amount and range of variability in data is high. Hence, biological systems must be flexible in handling data types and values.


  •  Defining and executing complex queries.

   Mainly these databases are used by the biologists who do not possess a great technological knowledge in how the queries are structured and how these data are being stored. Therefore simple interfaces should be implemented with integrating query templates.





So guys I hope this information will be helpful for you if you are a bioinformatics enthusiast like me and craving for more insight in this field. In the next post I would like to discuss some insight details on genome database systems. Till then Good bye. Learn and Empower yourselves. 



No comments:

Post a Comment