Part I. The back story from the genome record
Together, these five posts describe the discovery of a novel paramyxovirus in the Aedes aegyptii genome and a new method for finding interesting anomalies in GenBank.
I. The back story from the genome record
II. What do the mumps proteins do? And how do we find out?
III. Serendipity strikes when we Blink.
IV. Assembling the details of the case for a mosquito paramyxovirus
V. A general method for finding interesting things in GenBank
I began this series on mumps intending to write about immunology and how vaccines work to stimulate the immune system. I still plan to write about vaccines, because I love immunology, but .....well........ along the way, I decided to play a bit with the sequences in the mumps genome. I can't help it. You can learn a lot about a virus from the proteins it makes.
Getting the back-story from the genome record
I did what I usually do when I want to learn about a new virus. I went to the NCBI and searched for the mumps virus genome among the 1600 or so eukaryotic viral genomes that have been sequenced. Finding the genome sequence told me that mumps has a single-stranded genome; 15,384 bases long; made of RNA. Unlike some single stranded RNA viruses like influenza, the mumps genome is in one piece.
Interestingly, mumps RNA can't be translated directly into protein. Mumps RNA is complementary to the RNA strand that would be used for translation. This means that mumps requires an extra step in the decoding process. The mumps RNA has to be copied first in order to produce the complementary copy that gets used to direct production of mumps proteins. This is a lot like transcription, except that the template is RNA instead of DNA.
This business of copying RNA is unusual enough. Copying the RNA also happens in an unusual place. Normally, nucleic acids get copied in a special compartment, the nucleus. In the case of mumps and other single-stranded RNA viruses, all the RNA copying happens outside of the nucleus in the cytoplasm. Mumps, of course, also has to bring it's own enzyme along to do the job, since eucaryotic cells don't normally do this kind of work.
When I followed the taxonomy links, I could also see that other paramyxoviruses, related to mumps, have been found in fish (salmon), snakes, dogs, sheep, and pigs.
On to the research
I went back to the mumps genome record and looked a bit closer.
It was then I noticed it. There was a brand new, itty bitty link below the graph of the genome.
"Click me!" it said.
Feeling a little like Alice in Wonderland, I clicked it, wondering all along if I was going somewhere interesting or falling into a rabbit hole.
The other end of the link
Luckily, it turned out to be interesting.
The green graph (top) shows the positions of the genes, the red graph on the bottom shows proteins. You can see the second gene encodes two different proteins. That's kind of neat. I found, too, that when I held my cursor over the sequences, menus appeared with links to various things that I could do. It turned out I could get FASTA sequences, GenBank records, and pre-computed BLAST results.
Web pages at the NCBI are oddly reminiscent of the games that my kids used to play. My daughters used to spend hours playing Millie's Math House and something with Fribbles. Even today, I can hear them singing along with the theme music to Millie's Math House. And there aren't any words!
Anyway, in Millie's Math House, you had to click objects to find out what they would do. The pages at the NCBI are designed the same way. There's no way of guessing ahead of time, you just have to take the plunge and either move your mouse over things or click on random objects, just in case.
Notice below, I moved my mouse over one of the maps of a protein sequence and I found lots of links.
Next, I started randomly clicking protein sequences and finding out what they matched.
You can do this yourself and jump ahead or wait until tomorrow and see what I found.