[an error occurred while processing this directive] [an error occurred while processing this directive]
small picture of Dirk

STAFF & STUDENTS : Staff Page

» Dirk's Home Page

What is Bioinformatics?

The turn of the millennium is characterized by the dawn of a new scientific revolution, which will have as great an impact on society as the industrial and computer revolutions before. This revolution was heralded by an exciting large-scale DNA sequencing effort in July 1995, when the entire 1.8 million base pairs of the genome of Haemophilus influenzae , a small Gram-negative bacterium, was published - the first of a free-living organism. Since then, the amount of DNA sequence data in publicly accessible data bases has been growing exponentially and is now about to claim its biggest triumph: the complete 3.3 billion base-pair DNA sequence of the entire human genome, as pre-released by an international consortium of 16 institutes on 26 June 2000.

DNA sequences are valuable because they provide the most detailed anatomy possible for any organism - the blueprint of life with the complete set of instructions for how each working part should be assembled and operate. Yet while identifying the order of letters in our genetic alphabet is an important and necessary first step, the much more complicated task lies still ahead: This is to tell what those letters mean, what they do, and what can be done if the messages they spell out are in error - that is, laying bare the genetic triggers for hundreds of diseases of genetic predisposition.

A crucial problem to solve is the identification and location of genes - at current nobody has even a ballpark figure for how many genes humans have. Equally, or even more, important is the understanding of how genes are expressed, that is, which of thousands of genes are active in a given tissue sample. This information is buried in the non-coding regions of the DNA, which, in humans, make up more than 95% of the entire genome. Beyond that, knowing the code for a gene does not imply that one knows the structure and function of the protein it produces - vital information required to understand how the genetic code locked in our cells ends up constructing and maintaining a fully functioning being. Thus while sequencing experiments have succeeded in spelling out the book of life, this book is written in cryptic hieroglyphics that, to date, we do not understand.

Consequently, in the last decade, a new scientific discipline - bioinformatics - has emerged in an attempt to interpret the increasing amount of DNA sequence data. The problems faced are essentially statistical, due to the inherent complexity of biological systems - brought about by evolutionary tinkering - and to our lack of a comprehensive theory of life's organization at the molecular level. The task, thus, is to extract patterns from large amounts of noisy data in the absence of general theories, that is, to learn the theory automatically from the data through a process of inference, model fitting, and learning from examples.

[an error occurred while processing this directive]