Harindu 's Personal Blog : Genome Database Systems

Hey guys ! So as I promised yesterday , today I will be giving you some kind of a deeper analysis in to genome database systems using few criteria.

Deeper Analysis in to Genome Database Systems

Typologies of recoverable data

Genomic databases contain large set of data types. Data that are recoverable from these genome databases can be distinguished in to six categories. They are Genomic segments which include all the nucleotide subsequences which are eloquent from a biological point of view such as genes, clones etc. The second one is the Maps. Those are the “result from projects that produced the sequencing and mapping of the DNA of diverse organisms, such as the Human Genome Project”. Third one is Variants and mutations which are the alternations which happen in a less frequency in DNA and proteins. The next category is the Pathways which describes the interaction of genes. The fifth one is Expression Data which are experimental data about different levels of expression of genes. The final data type category is the Bibliographic references which are the repositories of relevant biological literature.

Database schema types

Most of the genome databases are belongs to relational data model. Mainly there are five types of database schemas in genome databases. They are unspecific schema, GUS, Genolist schema, Chado schema and Pathway tool schema. Under unspecific schema all the relational schemas and schemas which do not belong to the mentioned schemas are categorized. Genomic Unified Schema or GUS is a relational schema which is suitable for large set of information. Mainly in Genome databases systems this is used. Genolist schema is also a relational schema which is used to manage Bacterial Genome Database Systems. A schema which has “an extensible module structure” is a Chado Schema. This is used in Beetle Bees Genome Database. The Pathway Tool Schema is an object schema which is based on an ontology defining a large set of classes, attributes and relations to model genome databases.

Query types

In genome database systems we can see three main types of queries. They are simple query, Batch query and Analysis query. Using simple query we can recover data “satisfying some standard search parameters.” Batch queries consist of bunch of simple queries which are being processed simultaneously. The most relevant and complex query type is Analysis query. It can be further divided in to two categories. They are Pattern search and sequence similarity. In Pattern query, it inputs a pattern and a DNA sequence and returns “those subsequences of DNA sequence, which turn out to be most strongly related to the input pattern”. In the Sequence similarity it takes a DNA sequence as an input and return those “sequences found in the database that are the most similar to the input sequence”.

Query methods

In genome database systems there are four main query methods. They are Text based queries, Graphical interaction based queries, Sequence based queries and query language based queries. The most common methods are the text based ones. In this query method user can specify sets of words and can use logical operators. The next most common method of query in genome database systems in Graphical interaction based queries. “A large set of genomic information can be visualized by physical and genetic maps, whereby queries can be formulated by interacting with graphical objects representing the annotated genomic data.” In sequence based queries a nucleotide and a data mining algorithm are given as inputs and alignments and similar information are retrieved as outputs. Query language based queries use languages as SQL for level interaction with the database.

Export Formats

In genome databases several formats have been used to represent the query results. HTML, XML, FASTA, Flat File are some of the examples.

From next post onward I am hoping discuss some major researches done in this field of genome database systems. Till then take care everyone . Good Bye !

Harindu 's Personal Blog

Sunday, May 4, 2014

Genome Database Systems - Deeper Analysis