Wednesday, September 24, 2008 - 01:43
Part IV. Assembling the details and making the case for a novel paramyxovirus This is the fourth in a five part series on an unexpected discovery of a paramyxovirus in a mosquito. In this part, we take a look at all the evidence we can find and try to figure out how a gene from a virus came to be part of the Aedes aegypti genome.
I. The back story from the genome record II. What do the mumps proteins do? And how do we find out? III. Serendipity strikes when we Blink. IV. Assembling the details of the case for a novel mosquito paramyxovirus V. A general method for finding interesting things in GenBank In part III, I wrote about using Blink to find matches to the mumps RNA-dependent RNA polymerase (also known as a "replicase") and the surprise of finding a really good match in the genome from Aedes aegypti (a type of mosquito that carries yellow fever and dengue virus). The curious thing about this match is that no other metazoan genome contains a match to this viral protein. This makes sense. Metazoans, like us, other animals and insects, don't need to make copies of RNA sequences, especially not RNA molecules that are hanging out in the cytoplasm. The only guys who need an enzyme like this are viruses. I can think of three possible explanations for why we might seeing this gene in the Aedes aegypti genome. Case A. There's a mistake in the genome assembly. Case B. The replicase gene really is a normal part of the mosquito genome and somehow got missed in the blast search. Case C. The replicase gene ended up in the Aedes aegypti genome through the actions of a retrotransposon, and the presence of this sequence might be unique to the strain of mosquitoes used for the genome sequence. There could be more explanations but these are all I can think of right now. Let's go through each case and see how the evidence supports or refutes each one. Case A. There's a mistake in the genome assembly. When I first found the replicase gene in the mosquito genome, I was pretty sure that this was a mistake in the sequence assembly. There are quite a few repetitive sequences in the Aedes aegypti genome that surrond the replicase gene. Those kinds of sequences are notorious for causing mistakes in assemblies. And the replicase gene appears to be located in a supercontig whose assembly hasn't yet been completed. Plus, we know in part II, we found that Li et. al. (1) discovered a new virus in a cell line when they were trying to identify genes turned on by angiotensin. But, the more I thought about it, the less this explanation made sense. When Li did their work, they were isolating RNA, making DNA copies (cDNA), and sequencing the cDNA. It makes sense in their case that, if there happened to be any RNA viruses in the cell, those would also get converted to cDNA, and would get sequenced along with the human RNA molecules (or in Li's case, the rat RNA molecules). The mosquito genome was sequenced by scientists at the Broad Institute and published in Science in 2007 (2). In reading the paper, we can see how the sequencing process worked. The Broad scientists made libraries of cloned DNA fragments, sequenced the DNA, then assembled the sequences together. Paramyxoviruses, however, are made from RNA. You can't clone RNA unless you convert it to cDNA first, and that wasn't part of the process. It's sad, but the nice neat explanation falls apart when we review the experiment. Case B. The replicase gene really is a normal part of the mosquito genome and somehow got missed in the blast search. To check this, I decided to look at VectorBase VectorBase is a specialized database for researching the genomes of insects that transmit disease. At VectorBase, you can find genome sequences for mosquitoes like Anopheles gambiae, Aedes aegypti, and the house mosquito Culex pipiens; as well Ixodes scapularis (tick), Pediculus humanus (louse), and others. You also find standard tools for comparing and aligning sequences, tools for looking at gene expression and some specialized tools for comparing genomes.
image from the Public Health Library
I compared the region of the Aedes aegypti genome that contains the replicase gene with the corresponding regions in two other mosquito genomes, the one from Anopheles gambiae and Culex pipiens. I found from this, and other blast searches, that the replicase gene is not present in those other mosquito genomes. Case B is ruled out. Case C. The replicase gene ended up in the Aedes aegypti genome through the actions of a retrotransposon, and the presence of this sequence might be unique to the strain of mosquitoes used for the genome sequence. Ruling out cases A and B, leave us in the end with case C. Retrotransposons are really cool elements in the genome that are a bit similar to retroviruses. They are found in the nuclear DNA. They can be transcribed into RNA and, here's what's wild about them, they make reverse transcriptases that produce DNA copies of their RNA and then, they also make an integrase that helps them move into new places in the genome. Why do I think a retrotransposon might be involved? Galagan et. al. found that the Aedes aegypti genome is chock full of retrotransposons. Plus, there's a sequence from a retrotransposon called "Pao_Bel," overlapping the end of the replicase gene.
I should mention, too, that while Galagan et. al. didn't say anything about this replicase in their 2007 paper, they did mention finding 6 flavivirus sequences incorporated in the genome. Those sequences could have been put in the genome in a similar way. What's the take home message? Now it's time to play Hercule Poirot and use those little gray cells to try and reconstruct what happened. I think an ancestor to the Liverpool mosquito was buzzing around one day and sucked some nectar from a plant and got a snoot full of a plant virus. I don't know much about insect reproduction or how the virus ended up near the newly forming germ line cells, but these viruses can make cells fuse together, so I can imagine this happening somehow. When the mosquito cells were dividing, a retrotransposon copied part of the viral RNA and caused it to get integrated into the host genome. It would be really interesting to see if other strains of Aedes aegypti share this gene and maybe even use PCR to try and find paramyxoviruses in wild insects. Tomorrow, I will describe a general technique for finding anomalies and discovering interesting things in GenBank. References:
- Z LI, M YU, H ZHANG, D MAGOFFIN, P JACK, A HYATT, H WANG, L WANG (2006). Beilong virus, a novel paramyxovirus with the largest genome of non-segmented negative-stranded RNA viruses Virology, 346 (1), 219-228 DOI: 10.1016/j.virol.2005.10.039.
- V. Nene, J. R. Wortman, D. Lawson, B. Haas, C. Kodira, Z. Tu, B. Loftus, Z. Xi, K. Megy, M. Grabherr, Q. Ren, E. M. Zdobnov, N. F. Lobo, K. S. Campbell, S. E. Brown, M. F. Bonaldo, J. Zhu, S. P. Sinkins, D. G. Hogenkamp, P. Amedeo, P. Arensburger, P. W. Atkinson, S. Bidwell, J. Biedler, E. Birney, R. V. Bruggner, J. Costas, M. R. Coy, J. Crabtree, M. Crawford, B. deBruyn, D. DeCaprio, K. Eiglmeier, E. Eisenstadt, H. El-Dorry, W. M. Gelbart, S. L. Gomes, M. Hammond, L. I. Hannick, J. R. Hogan, M. H. Holmes, D. Jaffe, J. S. Johnston, R. C. Kennedy, H. Koo, S. Kravitz, E. V. Kriventseva, D. Kulp, K. LaButti, E. Lee, S. Li, D. D. Lovin, C. Mao, E. Mauceli, C. F. M. Menck, J. R. Miller, P. Montgomery, A. Mori, A. L. Nascimento, H. F. Naveira, C. Nusbaum, S. O'Leary, J. Orvis, M. Pertea, H. Quesneville, K. R. Reidenbach, Y.-H. Rogers, C. W. Roth, J. R. Schneider, M. Schatz, M. Shumway, M. Stanke, E. O. Stinson, J. M. C. Tubio, J. P. VanZee, S. Verjovski-Almeida, D. Werner, O. White, S. Wyder, Q. Zeng, Q. Zhao, Y. Zhao, C. A. Hill, A. S. Raikhel, M. B. Soares, D. L. Knudson, N. H. Lee, J. Galagan, S. L. Salzberg, I. T. Paulsen, G. Dimopoulos, F. H. Collins, B. Birren, C. M. Fraser-Liggett, D. W. Severson (2007). Genome Sequence of Aedes aegypti, a Major Arbovirus Vector Science, 316 (5832), 1718-1723 DOI: 10.1126/science.1138878