Saturday, January 27, 2007 - 05:50
Considering that several genomes that have been sequenced in the past decade, it seems amazing in retrospect, that the first complete bacterial genome sequence was only published 12 years ago (1). Now, the Genome database at the NCBI lists 450 complete microbial genomes (procaryotes and archea), 1476 genomes from eucaryotes, 2145 viruses, and genome sequences from 407 phage. Much of the methodology used for sequencing DNA is designed to confront one big technical hurdle. That is, we can only determine the sequence of small pieces of DNA at a time. This means that you must break a larger piece of DNA into smaller pieces, determine the sequence of each piece, and then put the sequence together. Mapping vs. Shotugn When people were sequencing smaller pieces of DNA, in the 80's, it was common to map the DNA first using restriction enzymes, so that you knew how the pieces fit together. At first, many insisted that this same strategy should be applied to genomes as well. There were those who argued that genomes should be broken apart and each piece carefully mapped before sequencing began. And on the other hand, there was Craig Venter arguing that genome sequencing would be much quicker with a shot-gun approach. Thinking along the lines of a traditional laboratory, where the labor is cheap and the reagents are expensive, the mapping approach seemed pretty logical. Each piece of DNA would be carefully mapped, so you would know where it fit into a larger piece, and then sequenced. The downside of mapping first, is that there's a cost in terms of time and of labor. Currently, you can obtain sequences that are about 900 bases long, using ABI instruments and chemistry. This would mean that to sequence a genome, like that of E. coli, that's 4,638,858 bp in length (2), by mapping it first, you would need at least 6000 fragments that were well mapped. The shot gun approach, where DNA is broken into several overlapping pieces, each piece is sequenced, and computer programs figure out how the pieces fit together, turned out to be much faster, and less costly in terms of labor. Today, genome sequencing uses a combination of mapping and shot gun sequencing. Large pieces of DNA, on the order of 150,000 bp, are first cloned in BACs (Bacterial Artificial Chromosomes). The positions of the BACs are mapped, so it's known where they fit relative to each other and where they overlap. Then the sequence of each BAC is determined using a shotgun strategy. I'll write more on the shot gun approach in the next post. Read part I. Part III: Reads and chromats Part IV: How many reads does it take? Part V: checking out the library References: 1. Fraser CM, et.al. 1995 "The minimal gene complement of Mycoplasma genitalium." Science. Oct 20;270(5235):397-403. 2. Koonin, E. 1997. "Big Time for Small Genomes." Genome Research, 7:418-421.