Monday, September 22, 2008 - 10:00
Part II. What do mumps proteins do? And how do we find out? This is the second in a five part series on an unexpected discovery of a paramyxovirus in mosquitoes, and a general method for finding interesting things. I. The back story from the genome record II. What do the mumps proteins do? And how do we find out? III. Serendipity strikes when we Blink. IV. Assembling the details of the case for a mosquito paramyxovirus V. A general method for finding interesting things in GenBank In Part I, we looked at the NCBI SeqViewer, and found a new way to check out a genome map, and learn more about individual genes and proteins. When we look at proteins from the mumps virus what do we find? To be honest, I was a lot more random about this when I was playing, but once I found something, I realized it would better to be more systematic and look at each protein one by one. If I go through all the proteins and systematically review them with Blink, for almost all of the proteins, I find the kind of things that I'd expect. I also find the links that will tell me what the proteins do - at least as far as we know. Wait, wait, wait! What is Blink? Oh, sorry. When protein sequences, either confirmed or predicted, enter GenBank, the NCBI has an automated system that uses blastp to compare these sequences to all of the sequences in the protein sequence databases. These results can be accessed by selecting the Blink link. Why is Blink helpful? There are many reasons. First, you don't have to wait for blast or do the blast search yourself. Second, you get many more results than you would from a normal blastp search and the results are organized by kingdom. Usually, when you do a blastp search you only about a hundred results, by default, and you don't see everything that's there unless you think to look for it. Third, you can filter the results in interesting ways. For example, if you just want to see protein sequences that are in 3-D structures, you can do that. You can also get multiple alignments and phylogenetic trees from Blink. Anyway back to our story. What do we get if we Blink when we're looking at the mumps proteins? First, I can find something about the function of each of the eight mumps proteins. This will be the focus of today's post. To use Blink, I found a record for mumps in the NCBI genome database and selected the Sequence Viewer link (see part I for instructions). I liked this method because I could see all the proteins encoded by the mumps genome in one view and look at them one by one. Here's what this looks like:
I got this menu by holding my mouse over the red protein graph on the far left side of the map. You can see in the yellow menu that this protein is a nucleocapsid protein. Then, I clicked the Blink link at the bottom of the menu to find out what it does.
Not surprisingly, I found that the nucleocapsid protein only matched proteins from other viruses. The three sequences in the Other category were from constructs. Then I selected the link to the first sequence to see if I could learn more from the protein record. This part involved a bit of trial and error. Some records had information, some did not. This record told me that the mumps nucleocapsid protein, NP_054707.1, protects the viral RNA genome, along with some other information about the structure of the protein. The next two proteins are encoded by the same gene: V/P. The V protein is the smaller of the two proteins. And, the P protein shows us that GenBank is missing a spell check function. The P protein should be listed as a "phosphoprotein" but the name in the menu is "phoshoprotein."
Sigh. When I look at the Blink results, I can see that about half of the matching sequences have the same spelling error. I also see that this sequence matches proteins from 256 viruses and nothing in any other kingdoms. What does the phosphoprotein do? Selecting the link to the first sequence, I find some interesting things. First, this protein is made by editing the viral RNA. Ooooh! I love RNA editing. Second, this protein is part of the viral RNA polymerase and it helps make proteins. The other protein that's encoded by the same gene, the V protein, is made from unedited RNA. This protein matches sequences in 215 viruses and nothing else. And, it's thought to block interferon. Interferons are proteins that help defend us from certain kinds of viruses. The next protein is M, or membrane protein. Interestingly, this protein matches 468 viruses and one metazoan sequence. This is cool! Why? Because we are metazoans. If we click Metazoa, we see which metazoan sequence is matching our mysterious membrane protein. Our Blink results imply that this sequence is human. But, if we click the accession number for this sequence, we see that this sequence comes from a paper with this intriguing title:
Beilong virus, a novel paramyxovirus with the largest genome of non-segmented negative-stranded RNA virusesAh, hah! This wasn't a human sequence at all. Interestingly, the GenBank record states that this sequence came from a Homo sapiens (human) mesengial cell, but when I looked at the abstract for the paper, the abstract says that this virus probably came from a rat mesangial cell line, not a human cell at all. It just goes to show, it's not enough to look at the databases, you do need to read the papers or at least the abstracts. Anyway, what does the membrane or matrix protein do for the mumps virus? The GenBank record for P33482 says that this protein is involved in assembly of the viral particle, and it interacts with the viral membrane. Onward. Next we have the F or fusion protein. This protein matches sequences in 3793 viruses and 3 sequences in metazoans. Let's check out those metazoan sequences. The three metazoan sequences are: 1. Angrgm-52 from Homo sapiens and 2. Two different entries for the same sequence, the original entry EDV20972, and the same sequence as a reference sequence, XP_002116616. Both of these sequences are described as hypothetical sequences from the genome of Trichoplax adhaerens. What can we say about these results? First, the supposedly human Angrgrm-52 sequence comes from the same set of sequences of supposedly human mesanglial sequences that contained a paramyxovirus (Li, et. al.). The GenBank record for this sequence, AAL62340, hasn't been updated yet, but I think we can be pretty confident that this a viral sequence and not a human sequence. Next, I know very little about Trichoplax adhaerens. The genome sequencing proposal says it's a simple multicellular, marine organism lives in tropical waters around the world. The proposal also has lots more information about it if you're interested. Maybe. When I look at the match, the aligning region is a little on the short side, only 100 of the 538 amino acids are aligning to the hypothetical Trichoplax sequence. A paramyxovirus sequence may have gotten included in the Trichoplax genome assembly, the evidence isn't as strong though, as it was in the rat cell line.
I almost forgot. What does the mumps fusion protein do? From the GenBank record for P11236, we find that the fusion protein helps the membrane of the virus fuse with the membrane of the cell. We're almost done. The next protein is a small hydrophobic protein. It only matches viral proteins and the GenBank record for P22112 says that it probably functions by inhibiting TNF-alpha signaling and block apoptosis (a special kind of cell-death) in infected cells. These functions are inferred from the sequence similarity to other proteins. Our last protein for today, is the hemagglutinin-neuraminidase. This protein matches 830 viral proteins and it functions by binding to receptors - helping the virus to infect the right kind of cell - and by cutting sugars off of cells - allowing the virus to escape from dead cells. Wheew! That's enough for today. We'll look at the last protein sequence (the L protein) in part III. Reference: Z LI, M YU, H ZHANG, D MAGOFFIN, P JACK, A HYATT, H WANG, L WANG (2006). Beilong virus, a novel paramyxovirus with the largest genome of non-segmented negative-stranded RNA viruses Virology, 346 (1), 219-228 DOI: 10.1016/j.virol.2005.10.039