DNA sequencing and sequence variation

Related Terms

Adenine, base, chromosome, cloning, cytosine, diagnosis, DNA, gel, gene, genetic counseling, genome, genomic sequencing, guanine, inherited genetic disease, knockout mouse, PCR, polymerase chain reaction, polymorphisms, regulatory sequence, sequence, sequencing DNA, thymine, polymorphism.

Background

DNA sequencing is a technique that researchers use to determine the sequence of DNA (deoxyribonucleic acid) along a chromosome or in a gene. DNA is located in a compartment of the cell called the nucleus and is packaged in structures called chromosomes. Human cells contain 46 chromosomes (organized into 23 pairs), and each chromosome has hundreds of genes. Genes contain the instructions for making the proteins that perform all the functions in the human body. Chromosomes also contain many other regulatory sequences. A regulatory sequence is a region of DNA that controls how much of a gene will be made, when it will be made, and where in the body it will be made.
DNA contains four different chemical compounds called bases: cytosine, thymine, guanine, and adenine. In any given person, these bases are found in a particular order along the chromosomes. The order of these bases stores information for making genes. Even though the DNA sequences of individuals are similar (on average, DNA is about 99.9% identical and 0.1% different between any two people), the differences in DNA between people are important.
Differences in DNA between people may be responsible for differences in traits (for example, some people are taller). Changes in DNA, called mutations, may cause some people to develop genetic diseases. For example, Duchenne muscular dystrophy is a disease caused by a genetic mutation that leads to loss of muscle function. By identifying specific genes that are mutated in a disease, researchers can better understand how the disease is caused, and they may be able to use this information to develop drugs to fight the disease.

Methods

Obtain DNA: To sequence DNA, researchers may begin by cloning the DNA they are interested in studying. DNA cloning is a technique that can be used to create multiple copies of a piece of DNA so that there is enough material to sequence. To perform DNA cloning, researchers put the DNA into a self-replicating genetic element, such as a plasmid, that can make copies of itself when inserted into an appropriate host cell such as bacteria. After the DNA has replicated in the host cell, it is isolated and may be sequenced for further study.
Researchers may also use a technique called polymerase chain reaction (PCR) to generate more DNA to use in sequencing. PCR is a chemical reaction that occurs in a test tube and is used to duplicate a piece of DNA. After the DNA is replicated using PCR, it may be sequenced directly or it may be cloned and then sequenced.
Perform sequencing reaction: After a researcher obtains a DNA sample to sequence, a sequencing reaction is performed on the DNA. In this reaction, an enzyme called a polymerase is used to replicate or synthesize the DNA. The new DNA is synthesized by the polymerase in a test tube using free bases (cytosine, thymine, guanine, and adenine). The sequencing reaction is designed so that the target DNA molecule will be replicated many times, but each time the replication occurs, synthesis of the new DNA strand will be stopped at a different position in the DNA sequence. At the end of the sequencing reaction, many different DNA products of different lengths will be generated. All of these products will be shorter than the original product.
Researchers can use the DNA products from the sequencing reaction to determine the sequence of the original piece of DNA. During the sequencing reaction, each DNA product is labeled with one of four fluorescent dyes, and each of the four bases has its own unique dye. Based on which of the four dyes a DNA product is labeled with, researchers can determine which base (cytosine, thymine, guanine, or adenine) was the last to be added to the DNA strand during the reaction. This determination can be made because the DNA sequencing reaction is designed so that only the last base to be added to a newly synthesized strand will have a fluorescent tag. These tagged bases are designed to stop the synthesis of the new strand after they are added.
Separate DNA products by size: By arranging all of the differently sized products from the DNA sequencing reaction from smallest to largest and then determining the last base to be added to each product using the fluorescent tag, researchers can determine the sequence of the original DNA strand. In order to separate the DNA products by size, researchers may put the DNA products in a gel-like substance and use an electric field to separate the products. DNA pieces are electrically charged, and smaller pieces move more quickly through a gel than bigger pieces when an electrical field is applied. Commonly, a machine called an automated DNA sequencing machine is used to separate the DNA pieces and to determine the order of bases in the DNA based on the fluorescent tags.
After a DNA sequence is obtained, it will appear as a long line of letters with A corresponding to adenine, T to thymine, G to guanine, and C to cytosine. For example, a short DNA sequence may appear as AGCCTGATCCGGGATCAGCTTAAAGCTTAGCCGTAAAAAGT. Researchers are typically able to sequence about 500-1,000 bases in one reaction.

Research

Identifying disease genes: Obtaining the DNA sequences of genes may help researchers identify genetic mutations that cause diseases. If a researcher is able to identify a specific gene that appears to be responsible for causing a disease, sequencing can be used to check for mutations in that gene. Normally the gene is sequenced both in individuals who have the disease of interest and in healthy individuals. Genetic changes that are found in the individuals who have the disease but not in healthy individuals may be involved in causing the disease.
In some diseases, such as cancer, patients have large deletions of chromosomes that affect many genes. The deletion of these genes may have contributed to processes that caused certain cells to become cancerous. By using a DNA sequence, researchers may identify all known genes in the particular region that is deleted in a cancer patient. This could help researchers narrow down and identify genes that may be responsible for causing cancer.
Identify polymorphisms: Although the DNA sequence between individuals is almost identical, about 0.1% of DNA is different between any two individuals. A polymorphism is a difference in the sequence of DNA between different individuals and a single nucleotide polymorphism is a polymorphism that involves just one base. It is thought that some polymorphisms do not cause any differences in the physical or mental traits between people. However, some polymorphisms may change the function of a gene or the amount of protein that a gene produces, and these polymorphisms may have an impact on the physical and mental traits of people. By sequencing DNA from many individuals, it is possible to find genetic differences between people that may affect human traits. For example, it has been found that polymorphisms in specific genes involved in odor detection cause some individuals to perceive odors differently from other people.
Verify experimental DNA constructs: In many types of experiments that biologists perform, they need to build DNA constructs. For example, when developing a genetically modified animal, such as a knockout mouse (a mouse in which a specific gene has been deactivated), a researcher may use a gene-targeting construct to inactivate a target gene. This construct contains some DNA that is identical to the mouse's existing gene, and also contains DNA that inactivates the gene by replacing or interrupting the targeted gene. Researchers usually need to build this construct in a laboratory by cutting different DNA pieces and pasting them together. In order to verify that the construct they built does not contain errors, researchers often sequence the DNA of the construct.
Sequence genomes: The genomic DNA sequence of an organism is the DNA sequence of every chromosome contained in that organism. For example, humans have a total of 23 unique paired chromosomes. Chromosomes 1 through 22 are called the autosomes, and the 23rd pair includes the X and Y sex chromosomes. Therefore, the genomic DNA sequence of a human is the sequence of all 23 of these chromosomes. In recent years, researchers have sequenced genomes from a variety of organisms, including humans, chimpanzees, mice, worms, and flies. Genomic sequences are valuable tools because they can help researchers better understand how many genes an organism has and where on specific chromosomes those genes are located.

Implications

By sequencing DNA, researchers may identify which changes in a gene (or regulatory sequence) cause a particular genetic disease. They may also identify which changes in genes or regulatory sequences are responsible for affecting other human traits, such as height or intelligence.
Diagnosing disease: For many different genetic diseases, researchers have identified genetic mutations that are responsible for causing the disease. If a patient displays symptoms of a particular genetic disease, DNA sequencing may be used to help diagnose that patient. For example, mutations in the frataxin gene are known to cause Friedreich's ataxia, a disease in which nerve tissue progressively degenerates. Genetic tests can be used to check for these mutations and confirm a diagnosis of Friedreich's ataxia. To perform these tests, blood may be drawn from a patient, and the frataxin gene sequenced to check for mutations.
Genetic counseling: If prospective parents have a family history of a genetic disease, they may choose to undergo genetic counseling to determine their chances of passing a disease on to their children. Each parent may have a specific gene sequenced to determine whether they carry a disease-causing genetic mutation.
For example, the acid alpha-glucosidase gene is mutated in patients with acid maltase deficiency (AMD), a recessive genetic condition that causes a buildup of glycogen. Individuals have two copies of the acid alpha-glucosidase gene, and a person needs to inherit two defective copies to develop AMD. People who have only one mutated gene are called carriers, meaning that they may not show signs of the disease but can pass it to their children. If only one parent is a carrier, none of the children will have AMD, but 50% of the children will also be carriers. If both parents are carriers, then there is a 50% chance that a child will be a carrier and a 25% chance that a child will develop AMD.

Limitations

DNA sequencing may be used by researchers to determine the sequence of a region of DNA. However, DNA sequencing generally does not tell researchers the function of a specific region of DNA or of a specific gene. Additional follow-up experiments are generally necessary in order to understand the function of a DNA region, even after its sequence is known.
Genomic sequences are valuable tools because they can help researchers better understand how many genes an organism has and where on specific chromosomes those genes are located. Because genomic DNA sequences contain millions of bases, however, they can be time consuming to generate. Additionally, it is still expensive to generate genomic DNA sequences for some organisms. The amplification step required when performing sequencing also increases the cost and time of sequencing.

Future research

New DNA sequencing methods have recently been developed that offer some advantages over traditional methods. These newer methods allow for faster and cheaper sequencing of DNA. In the newer sequencing methods, many copies of DNA are synthesized from one strand onto a small bead, a solid structure to which DNA is attached. Many individual beads, each with a different DNA sequence, can then be simultaneously analyzed.
Although newer methods allow researchers to generate sequences more quickly, one drawback is that the length of each individual sequence is generally shorter than the length that can be obtained with traditional methods. However, newer sequencing methods should still have considerable impact because they are faster and cheaper.

Author information

This information has been edited and peer-reviewed by contributors to the Natural Standard Research Collaboration (www.naturalstandard.com).

Bibliography

Favello A, Hillier L, Wilson RK, et al. Genomic DNA sequencing methods. Methods Cell Biol. 1995;48:551-69.
Keller A, Zhuang H, Chi Q, et al. Genetic variation in a human odorant receptor alters odour perception. Nature. 2007 Sep 27;449(7161):468-72.
Lander ES, Linton LM, Birren B, et al. Initial sequencing and analysis of the human genome. Nature. 2001 Feb 15;409(6822):860-921.
McCready ME, Carson NL, Chakraborty P, et al. Development of a clinical assay for detection of GAA mutations and characterization of the GAA mutation spectrum in a Canadian cohort of individuals with glycogen storage disease, type II. Mol Genet Metab. 2007 Dec;92(4):325-35.
National Center for Biotechnology Information. .
National Human Genome Research Institute. .
Natural Standard: The Authority on Integrative Medicine. .
Pandolfo M. Friedreich ataxia: Detection of GAA repeat expansions and frataxin point mutations. Methods Mol Med. 2006;126:197-216.
University of Michigan DNA Sequencing Core. .