Harindu 's Personal Blog : bioinformatics

Showing posts with label bioinformatics. Show all posts

Sunday, May 4, 2014

Genome Database Systems - Deeper Analysis

Hey guys ! So as I promised yesterday , today I will be giving you some kind of a deeper analysis in to genome database systems using few criteria.

Deeper Analysis in to Genome Database Systems

Typologies of recoverable data

Genomic databases contain large set of data types. Data that are recoverable from these genome databases can be distinguished in to six categories. They are Genomic segments which include all the nucleotide subsequences which are eloquent from a biological point of view such as genes, clones etc. The second one is the Maps. Those are the “result from projects that produced the sequencing and mapping of the DNA of diverse organisms, such as the Human Genome Project”. Third one is Variants and mutations which are the alternations which happen in a less frequency in DNA and proteins. The next category is the Pathways which describes the interaction of genes. The fifth one is Expression Data which are experimental data about different levels of expression of genes. The final data type category is the Bibliographic references which are the repositories of relevant biological literature.

Database schema types

Most of the genome databases are belongs to relational data model. Mainly there are five types of database schemas in genome databases. They are unspecific schema, GUS, Genolist schema, Chado schema and Pathway tool schema. Under unspecific schema all the relational schemas and schemas which do not belong to the mentioned schemas are categorized. Genomic Unified Schema or GUS is a relational schema which is suitable for large set of information. Mainly in Genome databases systems this is used. Genolist schema is also a relational schema which is used to manage Bacterial Genome Database Systems. A schema which has “an extensible module structure” is a Chado Schema. This is used in Beetle Bees Genome Database. The Pathway Tool Schema is an object schema which is based on an ontology defining a large set of classes, attributes and relations to model genome databases.

Query types

In genome database systems we can see three main types of queries. They are simple query, Batch query and Analysis query. Using simple query we can recover data “satisfying some standard search parameters.” Batch queries consist of bunch of simple queries which are being processed simultaneously. The most relevant and complex query type is Analysis query. It can be further divided in to two categories. They are Pattern search and sequence similarity. In Pattern query, it inputs a pattern and a DNA sequence and returns “those subsequences of DNA sequence, which turn out to be most strongly related to the input pattern”. In the Sequence similarity it takes a DNA sequence as an input and return those “sequences found in the database that are the most similar to the input sequence”.

Query methods

In genome database systems there are four main query methods. They are Text based queries, Graphical interaction based queries, Sequence based queries and query language based queries. The most common methods are the text based ones. In this query method user can specify sets of words and can use logical operators. The next most common method of query in genome database systems in Graphical interaction based queries. “A large set of genomic information can be visualized by physical and genetic maps, whereby queries can be formulated by interacting with graphical objects representing the annotated genomic data.” In sequence based queries a nucleotide and a data mining algorithm are given as inputs and alignments and similar information are retrieved as outputs. Query language based queries use languages as SQL for level interaction with the database.

Export Formats

In genome databases several formats have been used to represent the query results. HTML, XML, FASTA, Flat File are some of the examples.

From next post onward I am hoping discuss some major researches done in this field of genome database systems. Till then take care everyone . Good Bye !

Saturday, May 3, 2014

Genome Database Systems- Important areas in Data Management

Hey guys, today I will be discussing a very important aspect of Genome Database systems. It is Data Management in genome database systems.

Important areas in Data Management in Genome Database Systems

Nonstandard and unstructured data.

Most of the genome data are nonstandard and unstructured. It is not clear that every position in a DNA sequence should be treated as a data object. There are structural data like proteins, but they need a 3D representation. Techniques from GIS (geographic information systems) and CAD (computer aided design) as well as from geometric modeling need to be applied for efficient indexing and querying. Scientists are currently using tools like BLAST or PSBLAST to do pattern searches and this capability needs to be integrated into the DBMS.

Complex query processing.

As discussed in the characteristics of genome databases, since the similarity of sequences, graphs and 3-D shapes it has become really hard to implement queries. Relational DBMS and Object DBMS are not capable of processing these type of queries. Therefore DBMS developers have implemented path oriented queries and specialized libraries to cater this requirement.

Data interpretation and Meta data management.

Decent mechanisms should be implemented in the database system manage the meta data, because in a database system like this is enough meta data should be provided to the scientists for the interpretation purposes and they need to be maintained in a virtuous manner. In order to do that several techniques have been implemented such as use of Annotations and Ontology.

Data integration across related databases.

Various genome databases interrelated with each other and there should be a proper management tool to handle these kinds of cross related database links. Currently no uniform interfaces or consolidation of data has been done so that information can be accessed in an integrated fashion in any given context or by any particular classification.

Need for a set of uniform data management solutions.

There is a tremendous need to have this kind of uniform data management solution because of “typical problems in databases of heterogeneous data integration - multiple models, multiple formats, different underlying files and database systems, and a large amount of context-sensitive semantic content.”

This is a brief explanation on the key areas of data management in genome database systems. In the next post I hope to give your guys a mush more deeper analysis in to genome database systems. Till then Good bye folks. :D

Friday, May 2, 2014

Genome database systems - Overview

Hi guys after some time back to blogging. So this time I though of writing about a very demanding and growing area of bioinformatics, genome databases. For the last six month I have been working on genome database systems to write my independent study. In this post I would like to give a small overview to genome database systems and some of the characteristics of genome database systems.

Overview of Genome Database Systems

The study of genes and proteins has become an extremely important area in the modern day biology and they are better known as genomics and proteomics. In these areas larger number of biological data is being used frequently. Therefore the databases which contain these data play a vital role in fields of biology and medicine. The term genomes refer to the total amount of genetic code present in the cells of an organism. Genomics consists of two component areas. Namely structural genomics and functional genomics. Genome databases store this information and differently from gene databases the genome databases contain both coding and non-coding intergenic sequences. Following are some examples for genome databases.

Saccharomyces genome database
Mouse genome database
Human genome database
European mutant mouse pathology database
Mito Map
Kyoto Encyclopedia of Genes and Genomes

Characteristics of Genome Database Systems

Data are highly complex when compared with most other domains and applications.

Compared to other domains’ data types genome data have the highest possible complexity. This can be explained using the following example. In MITO Map database it stores the human mitochondrial genome. “This single genome is a small, circular piece of DNA encompassing information about 16,569 nucleotide bases; 52 gene encoding messenger RNA, ribosomal RNA, and transfer RNA; 1000 known population variants; over 60 known disease associations.” These types of data should be stored in way that can be processed by the computers and also should have the ability to be handled by biologists as well. At the first relational DBMS and Object Oriented DBMS approaches were taken to model this data but then scientists moved on to their own ways of representing these data. But currently relational DBMS is used for the sake of long time maintenance and ease of curation.

Schemas change at a rapid pace.

Therefore in order to have improved information flow management in released databases, there should be features to support data object migration and schema evolution. Most relational and object databases have a fixed schema. Therefore in order to maintain this characteristic, some databases, release new schema releases in every two or three years’ time. e.g.: GenBank

Representations of the same data by different biologists will likely be different.

Therefore there should be mechanisms to maintain the uniformity of the database. In order to implement this queries which can be interrelate and link different schemas have been used.

The amount and range of variability in data is high. Hence, biological systems must be flexible in handling data types and values.

Defining and executing complex queries.

Mainly these databases are used by the biologists who do not possess a great technological knowledge in how the queries are structured and how these data are being stored. Therefore simple interfaces should be implemented with integrating query templates.

So guys I hope this information will be helpful for you if you are a bioinformatics enthusiast like me and craving for more insight in this field. In the next post I would like to discuss some insight details on genome database systems. Till then Good bye. Learn and Empower yourselves.

Tuesday, October 29, 2013

Bioinformatics courses in Sri Lanka and around the World

Some guidance for the people who like to follow bioinformatics !

Hey guys ! So this time I would like to post some of the degrees in Bioinformatics and a list of universities that offer them.

If you like to follow a bioinformatics course in Sri Lanka then you have the following options .

BSc (Hon's) in Bioinformatics from Faculty of Science, University of Colombo.
Study a Biology BSc program in any university and follow an MSc in Bioinformatics from IBMBB or PGIM, University of Colombo.
Study Computer Science & Mathematics from any university and follow a MSc in Bioinformatics from IBMBB or PGIM, University of Colombo.
After a Hon's degree, contact professors/lecturers from Universities, would give the path for you to complete a PhD in Bioinformatics.

Following is the link for IBMBB - http://www.ibmbb.lk/

If you like to follow a bioinformatics course in some other country , then following list will be useful to you.

Asia and Pacific region

Bioinformatics Institute, Singapore (Masters program)
National University of Singapore (Training and workshops)
Nanyang Technological University, Singapore (Master of Science in Bioinformatics)
National Tsing Hua University, Taiwan (Undergraduate course)
University of Hong Kong (Bachelor of Science in Bioinformatics)
University of Malaya, Malaysia (offers BSc & Msc in Bioinformatics)
Kebangsaan University, Malaysia (offers Bachelors of Science (Honours) in Bioinformatics)
Mohammad Ali Jinnah University, Pakistan (offers Bachelor of Science in Bioinformatics)
COMSATS Institute of Technology, Pakistan (offers Bachelor of Science in Bioinformatics)

Australia

University of Melbourne (Bioinformatics stream within BSc degree)
University of Otago - Dunedin, New Zealand (Basic courses)
University of Queensland (3rd-year course)
University of Sydney (BSc program)
University of New South Wales (Bachelor of Engineering in Bioinformatics Programme)
Flinders University (Bachelor's Programme in Bioinformatics)

United Kingdom

Birkbeck College, University of London (Accredited online course in protein structure) M.Sc. in molecular modelling and bioinformatics
Cranfield Centre for Bioinformatics and IT - MSc. Bioinformatics
Oxford University - Part-time MSc. in Bioinformatics
University of Abertay Dundee, UK Information Technology (Bioinformatics) - PGDip/MSc
University of Exeter - MSc, Diploma and Certificate in BioinformaticsMSc/MRes/PgDip/PgCert in Bioinformatics
University of Leeds, M.Res. in Bioinformatics
University of Liverpool, M. Sc. in Biosystems and Informatics

United States of America

Baylor College of Medicine, USA Ph.D. in structural and computational biology and molecular biophysics
Boston University, USA M.S. and Ph.D. program in bioinformatics
Carnegie Mellon, USA B.S., M.S. and Ph.D. Merck Computational Biology and Chemistry Program
George Mason University, USA M.S. and Ph.D. in bioinformatics and computational biology
Georgia Institute of Technology, USA M.S. in bioinformatics; Ph.D. in Bioinformatics and Computational Biology
Iowa State University, USA Ph.D. program in bioinformatics and computational biology
Johns Hopkins University, USA Ph.D. program in computational biophysics
Keck Graduate Institute, USA M.S. and Ph.D. in Bioinformatics and Computational Biology
Marquette University/MCW, USA M.S. in bioinformatics
New Jersey Institute of Technology M.S. and Ph.D. in Computational Biology
North Carolina State University, USA M.S. and Ph.D. in bioinformatics
Northern Illinois University, USA M.S. specialization in bioinformatics
Northeastern University, USA M.S. in bioinformatics
OHIO State University, USA Graduate Program in Biomedical Informatics
Rensselaer Polytechnic Institute, USA B.S. in bioinformatics and molecular biology
Rutgers University, USA Ph.D. in computational molecular biology
Stanford University, USA M.S. and Ph.D. in biomedical informatics
University of Medicine & Dentistry of New Jersey, USA M.S. and Ph.D. program in biomedial informatics
University of Medicine & Dentistry of New Jersey, USA M.S. and Ph.D. Bioinformatics track
University of Albany, USA Ph.D. program in comparative functional genomics
University of California, Irvine, USA M.S. and Ph.D Informatics in Biology and Medicine
University of California, Los Angeles, USA M.S. and Ph.D. program in bioinformatics
University of California, Santa Cruz, USA B.S., M.S. and Ph.D. in Bioinformatics
University of California, San Diego, USA Ph.D. Bioinformatics
University of California, San Francisco, uSA M.S. and Ph.D. program in biological and medical informatics
University of Cincinnati, USA M.S. and Ph.D. Biomedical Engineering-Bioinformatics track
University of Illinois, Chicago, USA M.S. & Ph.D. Bioinformatics
University of Memphis, USA M.S. concentration in bioinformatics
University of Michigan, Ann Arbor, USA M.S. & Ph.D. Bioinformatics
University of Minnesota, USA M.S. & Ph.D. Bioinformatics
University of Nebraska, Omaha, USA M.S. and Ph.D. in Bioinformatics
University of North Carolina, USA M.S. & Ph.D. Bioinformatics and Computational Biology
University of Pennsylvania, USA B.S., M.S. and Ph.D. programs in Computational Biology/Bioinformatics
University of Pittsburgh, USA M.S. and Ph.D. in bioimedical informatics
University of the Sciences in Philadelphia, USA B.S. in bioinformatics
University of the Sciences in Philadelphia, USA M.S. in bioinformatics
University of Southern California, USA M.S. & Ph.D. in Molecular and Computational Biology
University of Texas at Austin, USA Ph.D. in Cell and Molecular Biology-Structural Biology and Bioinformatics track
University of Texas at El Paso, USA M.S. in bioinformatics
University of Washington, Seattle, USA Ph.D. Graduate program in computational molecular biology
University of Wisconsin-Madison, USA Graduate Certificate in Bioinformatics

So guys I think these information will be useful to anyone who is willing to study bioinformatics. Hope to see you soon with another exciting post. Until then prepare for your future !

Monday, October 21, 2013

Prospects of Bioinformatics

Reincarnation - Back to the blogging life after ages !

It has been quiet a while since my first post. Actually it has been one year plus some months for my second post. I was really busy with my academic work and dealing with other problems. So really sorry guys for taking such a long time to post my second post. Enough apologizing and lets get to the point straightway.

I know that many of you would wonder why on the earth this writer wants to make IT people in to biologists? Actually the main goal of this blog series is not to make IT guys biologist but to give some idea about what are the other fields that an IT personalty can apply himself without going in to the typical industrial opportunities. So today's title is about making you aware of these opportunities.

As you know bioinformatics is an interdisciplinary field which is a combination of several fields such as Biology, Biochemistry, Molecular Science , Computer science and etc. So by studying bioinformatics you will be able to gain knowledge in these variety of areas and will be able to work in different paths. In order to make it more simple and understandable look at the following diagram.

Mainly as you know any degree holder can get in to an academic career, so as the bioinformatics students. They can work in universities as academics. Other than that there are so many areas that a bioinformatics student can work.

These career prospects have been rapidly developing due to the improvement of this field and mainly the job prospects can be seen in all sectors of biotechnology, pharmaceutical,bio medical sciences ,hospitals and research institutes.

Following are some of the specific career areas in bioinformatics.

Sequence assembly
Database design and maintenance
Sequence analysis
Proteomics( Study of proteins)
Pharmacological
Pharmacology
Bioinformatics Developer
Computational chemist
Bio analyst and etc.

Bioinformatics also has many broad areas for career movement such as Bioinformatician, Bio medical computer scientist, Geneticists, Computational Biologist and Bio statistician etc.

So guys I think now you have some idea of the career prospects in bioinformatics. Hope to supply more information about the salary scales of these fields and how you can study bioinformatics, cutting edge technology used and research areas in bioinformatics in the near future.

Monday, June 25, 2012

BIOINFORMATICS-Beyond the tradition

BIOINFORMATICS- Beyond the tradition

Hi guys , this is my 1st blog. So there must be so many mistakes , please feel free to make me correct by leaving a comment .Specially this blog is dedicated to all the students who did Biology for their A/l s and now in the IT scene. (So as the writer ;)) This post might help you to satisfy with your selection. Yes you are right ! You can go back to the field of biology as a bioinformatician ;) ;):D. Keep on reading our Blog !!!

So as for my 1st post , i thought of writing a fast growing field around the whole world.At the moment you read this Blog's name, you would rather feel , mmm what should i say , "uncomfortable" (if you are in the field of IT or an IT student) with it . Some of you might think whether this guy is full of nuts or crazy !! ;)

All these thoughts are much expected and after reading the complete post you might get what I've tried to mean or you might not get it !;) ;)

So lets move on to our 1st topic !

As i mentioned earlier i am going to talk about an amazing field which has caught the eyes of the world . It is the field of BIOINFORMATICS. By glancing at its own name, one might get this idea that this field might be related to both IT and Biology. Yes, that is true. That is why i have named my blog ,"Making IT ians , Biologists". So lets move onto talk about this field.

According to Wikipedia , Bioinformatics is the application of implementing computer science and Information Technology methods in the field of biology and medicine. After the 20th century there are two main fields that got really developed in technology vise and knowledge vise. They are biology and IT. The advancement in the biology field can be really seen by the researches which carry on the Neurocomputing.

Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, structual biology, software engineering, data mining, image processing, modeling and simulation, signal processing, discrete mathematics, control and system theory, circuit theory, and statistics.

Commonly used software tools and technologies in bioinformatics.

Java
XML
Perl
C
C++
Python
R
MySQL
SQL
CUDA
MATLAB
Microsoft Excel

This technology was 1st revealed or rather implemented at the process of genome mapping, which was still an ongoing process. In this genome mapping the actual genome of a living cell or living thing is mapped and kept in a database and even by using modeling we can view 3D images of such genomes. One might think what is this so called genome ?? Genome is known as the entire inheritances information of a living organism. They are kept or stored in either DNA or in RNA(some viruses). Those are the tiny little helical shaped things which carry our whole system's information such as how we get our appearance , our habits , and even how we think.

Moving back to the field of Bioinformatics, the primary goal of bioinformatics is to increase the understanding of biological processes. What sets it apart from other approaches, however, its focus on developing and applying computationally intensive techniques to achieve this goal.Over the past few decades rapid developments in genomic and other molecular research technologies and developments in information technologies have combined to produce a tremendous amount of information related to molecular biology. Bioinformatics is the name given to these mathematical and computing approaches used to glean understanding of biological processes.Common activities in bioinformatics include mapping and analyzing DNA and protein sequences, aligning different DNA and protein sequences to compare them, and creating and viewing 3-D models of protein structures.There are two ways of modeling a a bilogical system. They are static and dynamic. These are categorized under a sub field structural bioinformatics.

So guys , this is a brief introduction to Bioinformatics, hope to continue on writing more on this field, specially the research areas and the prospects of bioinformatics. So hope you enjoyed my 1st post on Bioinformatics.

See you soon!! :D :D