Deeper Analysis in to Genome Database Systems
- Typologies of recoverable data
Genomic databases contain large set of data types. Data that are recoverable from these genome
databases can be distinguished in to six categories. They are Genomic segments
which include all the nucleotide subsequences which are eloquent from a
biological point of view such as genes, clones etc. The second one is the Maps.
Those are the “result from projects that produced the sequencing and mapping of
the DNA of diverse organisms, such as the Human Genome Project”. Third one is Variants and mutations which are
the alternations which happen in a less frequency in DNA and proteins. The next category is the Pathways which
describes the interaction of genes. The fifth one is Expression Data which are
experimental data about different levels of expression of genes. The final data
type category is the Bibliographic references which are the repositories of
relevant biological literature.
- Database
schema types
Most of the genome databases are belongs to relational data
model. Mainly there are five types of database schemas in genome databases.
They are unspecific schema, GUS, Genolist schema, Chado schema and Pathway tool
schema. Under unspecific schema all the relational schemas and schemas which do
not belong to the mentioned schemas are categorized. Genomic Unified Schema or GUS is a relational
schema which is suitable for large set of information. Mainly in Genome
databases systems this is used. Genolist
schema is also a relational schema which is used to manage Bacterial Genome
Database Systems. A schema which has “an extensible module structure” is a
Chado Schema. This is used in Beetle Bees Genome Database. The Pathway Tool
Schema is an object schema which is based on an ontology defining a large set
of classes, attributes and relations to model genome databases.
- Query
types
In genome database systems we can see three main types of
queries. They are simple query, Batch query and Analysis query. Using simple
query we can recover data “satisfying some standard search parameters.” Batch
queries consist of bunch of simple queries which are being processed
simultaneously. The most relevant and complex query type is Analysis query. It
can be further divided in to two categories. They are Pattern search and
sequence similarity. In Pattern query, it inputs a pattern and a DNA sequence
and returns “those subsequences of DNA
sequence, which turn out to be most strongly related to the input pattern”. In
the Sequence similarity it takes a DNA sequence as an input and return those
“sequences found in the database that are the most similar to the input
sequence”.
- Query
methods

- Export
Formats
In genome databases several formats have been used to
represent the query results. HTML, XML, FASTA, Flat File are some of the
examples.
From next post onward I am hoping discuss some major researches done in this field of genome database systems. Till then take care everyone . Good Bye !
No comments:
Post a Comment