Comparative genomics

Related Terms

Adenine, base, cytosine, DNA, gene, genetics, genome, guanine, regulatory element, sequence, thymine.

Background

DNA (deoxyribonucleic acid) is located in a compartment of the cell called the nucleus and is packaged in structures called chromosomes. Human cells each contain 46 chromosomes (23 pair), and each chromosome contains hundreds of genes. Genes contain the instructions for making the proteins that do the work in the human body. Chromosomes also contain many other regulatory sequences or instructions that control how much of a gene will be made, when it will be made, and where in the body it will be made. An individual's genome is the sum total of the information contained in an individual's chromosomes.
DNA contains four different chemical compounds called bases. These bases include cytosine, thymine, guanine, and adenine. In any given person, these bases are found in a particular order along the chromosomes, and it is the order of these bases that stores information for making genes. Even though the DNA sequences of individuals are similar (on average, DNA is about 99.9% identical between any two people), the differences in DNA are important.
Comparative genomics is an approach that can be used to compare the genomic DNA sequence of two or more different species or organisms. For example, researchers have compared the genomic DNA sequence of humans to that of chimpanzees.
The genomic DNA sequence of an organism is the DNA sequence of every chromosome that an organism has. For example, humans have a total of 24 unique chromosomes (chromosomes 1 through 22, as well as the X and Y sex chromosomes). Therefore, the genomic DNA sequence of a human is the sequence of all 24 of these chromosomes.
Chromosomes contain hundreds of genes, which provide the instructions for making proteins. Chromosomes also contain many other regulatory sequences (instructions) that control how much of a gene will be made, when it will be made, and where in the body it will be made. Comparative genomics can be used to compare the genes between two different species, and it can be used to compare the regulatory sequences between two different species.
To perform a comparison between two different genomes, the DNA sequence of each genome must be obtained. After a DNA sequence is obtained, it is analyzed using computer programs. Based on what a researcher is interested in, the electronic analysis can reveal a wide variety of information about the species being compared. For example, it could let a researcher know whether one species has specific genes that another species does not have. Alternatively, it could let a researcher know if the genes in one species have evolved, or changed, relative to the genes in another species.

Methods

Sequence DNA: In order to perform a comparison between two different genomes, the DNA sequence of each genome must be obtained. DNA contains four different chemical base compounds: cytosine, thymine, guanine, and adenine. In any given species, these bases are found in a particular order along the chromosomes, and it is the order of these bases that stores information for making genes and for the function of regulatory elements that control how much of a gene will be made, when it will be made, and where in the body it will be made.
Obtaining the DNA sequence of an entire genome can be time consuming. For example, the DNA sequence of the human genome contains about 3 billion base pairs. DNA sequences are usually obtained using a DNA sequencing machine, which reads the order of bases along a chromosome. After the sequence is read by the machine, it is stored in a computer.
Compare genomes: When the DNA sequences from the genomes of two or more species have been obtained, a comparative genomics analysis can be performed. The computer-stored DNA sequences are subjected to a comparative analysis, usually performed electronically via a computer program. For example, a computer program called BLAST can compare genomic sequences between two organisms. In general, a researcher will use a computer program to find similarities or differences between the genomic sequences of two different organisms.
When genomic DNA sequences are compared, researchers can look for small or large differences between the genomes. A small difference between two genomes may be the change of a single base at a particular location on a chromosome. For example, one species may have the base cytosine at a specific location, but another species may have the base adenine in the same location. Two different species may have larger differences in their genomes as well, even if they are closely related evolutionarily (organisms are closely related evolutionarily if they both recently evolved from the same ancestral organism). For example, one species may have two copies of a large stretch of 100,000 bases, such that this region is present twice in the genome of that species, but it may be present only once in the genome of a related species.

Research

By comparing the genomic DNA sequences between different species, researchers can learn about how those species are similar or different. Regions of the genome that remain very similar in sequence when compared among different species may have an important function, for example, they may provide the information for making the basic structure of a cell. If the sequence of a specific genomic region appears to change very little, it suggests that the genomic region is needed by all of the species to carry out some critical biological process.
If a specific genomic region differs between two species, it could mean that the region is involved in a difference between the two species. However, some areas of the genome may not have any function at all, so differences in the genomic DNA sequence of these areas between two species may not be important. In some cases, it may be difficult for researchers to understand which changes between two genomes have important functional consequences and which do not. In these cases, researchers may need to perform additional experiments, beyond a comparative genomics analysis, to find the answer.
The genomes of both humans and chimpanzees have been sequenced and compared to each other by researchers. They have found that 96% of human and chimpanzee DNA is identical; however, there are still 35 million base pairs that differ between the two genomes. Some of these differences are likely to be responsible for the differences between the two, such as the higher intelligence of humans.
Using comparative genomics, researchers are able to look for differences in the number and types of genes between different organisms, and they are able to look for differences within the same gene between organisms. Additionally, researchers are able to study regions of the genome that do not make genes. Some of these regions have important functions, such as regulating when or where a gene is made. Using comparative genomics, researchers can identify new regulatory regions, and also look for differences in regulatory regions between species. All of these changes may be involved in generating the different traits and characteristics observed among different organisms.
Evolution, which refers to genomic changes that occur between two different species, can be studied at the level of the organism's DNA. Organisms are closely evolutionarily related if they both recently evolved from the same ancestral organism. By comparing DNA sequences from different species and measuring how similar they are, researchers can determine how closely related the different species are to each other. Species whose DNA sequences are very similar to one another's are considered to be closely related, whereas species whose DNA sequences have more differences than similarities are considered to be more distantly related. This is because as more time passes, DNA accumulates more changes due to mutations, or changes in the genetic sequence.

Implications

Human disease: Comparative genomics may be useful in helping scientists better understand human diseases. As an example, there are many functional regions of the human genome that researchers have not yet identified, especially regulatory regions that control when and where a gene is made. Through comparative genomics, researchers may be able to identify some of these regulatory regions. If a specific region of the genome is highly similar between humans and several other species, it suggests that the region has an important function and may be involved in gene regulation. Human diseases may be caused by mutations, or errors, in genetic sequencing, but they may also be caused by mutations in the regulatory regions that control genes. Therefore, identifying new regulatory regions in the human genome may help scientists understand why specific mutations cause certain diseases.
Infection: Comparative genomics analysis may be used to better understand how some microorganisms, such as bacteria, cause infection. For example, there are several different but closely related strains of Escherichia coli (E. coli) bacteria, some of which cause infection in humans and some of which do not. By comparing the genomes of these different strains, researchers have been able to identify differences in genes that may be involved in infection. By studying these variations in genes, researchers may be able to develop new strategies to combat infection.

Limitations

Although comparative genomics can provide researchers with a large amount of useful information, sequencing the entire genome from one organism may be time consuming, expensive, and difficult. Therefore, a large amount of time and resources need to be applied to obtain valuable information from comparative genomics analyses.
Comparative genomics can be used to find similarities and differences in the DNA sequence between two different organisms. However, comparative genomics analyses are not able to provide definitive answers as to what these similarities or differences mean or whether they are important. Once a genomic comparison has been performed, researchers may need to do additional follow-up experiments to better understand the function or the relevance of a specific genomic region.

Future research

The ability of researchers to extract meaningful information from comparative genomics analyses depends on the availability of genomic DNA sequences. To date, the genomes of many organisms have been sequenced, including humans, chimpanzees, mice, worms, and flies. However, researchers continue to sequence the DNA of additional organisms. For example, the sequencing of the Neanderthal genome is an ongoing project. The more closely related the genomes are, the better researchers are able to interpret what differences between those genomes mean. For example, by comparing the human genomic DNA sequence to that of the closely related Neanderthal, researchers hope to learn more about recent human evolution.

Author information

This information has been edited and peer-reviewed by contributors to the Natural Standard Research Collaboration (www.naturalstandard.com).

Bibliography

Barta E. Comparative genomics-based orthologous promoter analysis using the DoOP database and the DoOPSearch web tool. Methods Mol Biol. 2007;395:319-28.
Center for Comparative Genomics. .
Chen SL, Hung CS, Xu J, et al. Identification of genes subject to positive selection in uropathogenic strains of Escherichia coli: a comparative genomics approach. Proc Natl Acad Sci U S A. 2006;Apr 11;103(15):5977-82.
Cheng JF, Priest JR, Pennacchio LA. Comparative genomics: a tool to functionally annotate human DNA. Methods Mol Biol. 2007;366:229-51.
Iyer LM, Anantharaman V, Wolf MY, et al. Comparative genomics of transcription factors and chromatin proteins in parasitic protists and other eukaryotes. Int J Parasitol. 2008 Jan;38(1):1-31.
National Human Genome Research Institute. .
Natural Standard: The Authority on Integrative Medicine. .
Oak Ridge National Laboratory. .
Zhu J, Sanborn JZ, Diekhans M, et al. Comparative genomics search for losses of long-established genes on the human lineage. PLoS Comput Biol. 2007 Dec 14; 3(12).