How does grass grow in the extremely hot soils of Yellowstone National Park? The quest continues.
And read onward to see where will we go.
In our last episode, I discovered a new tab in the protein database (well, new to me anyway).
If you select this tab, you get a list of protein sequences that are similar, by blastp, to the amino sequences in protein structures.
Naturally, I clicked the tab, and then the Links link, to see what this was all about.
But I was still bothered by the whole notion of how a sequence gets identified as having a "Related Structure"? What are the cutoff values? Where do they come from? via Pfam? via blast?
I spent (alright wasted) a long time searching around the NCBI website, trying to figure out how sequences got promoted to the Related Structures category. Finally, I gave up and asked a friend of mine at the NCBI who kindly referred me to the January publication in Nucleic Acids Research (2).
To paraphrase the paper (2), a protein sequence is considered to have a related structure, if the amino sequence of the protein matches an amino acid sequence in a structure. (Okay, I guessed that already, but how well do they have to match?)
The criteria for matching are pretty conservative:
1. There must be 50 or more aligned amino acids
2. Of those aligned amino acids, 30%, or more, must be identical.
So, the sequence of our unidentified protein is related to sequence of a protein in a structure file. Fine! What does that mean? How does that help us?
On the other side the Related Structures tab and what I found there
I clicked the tab, clicked the Links link (named "Related Structures:) and saw that the Aspergillus terreus sequence, that was 28% identical to my amino acid sequence (with an E value of 4 x 10-14), is related to a protein that has 8 identical subunits in a structure named 2CLB.
Since my 331 amino acid virus protein is significantly similar to the Aspergillus sequence, I think it's quite likely that my sequence is similar to the protein in the related structure, too. (Yes, it's my protein now, at least until I find a new pet molecule to play with.)
Oh yeah, and we might even get some ideas about what the protein does that helps the plant survive the heat. Who knows? Join us next week for the end of the story.
1. MÃ¡rquez, L., et. al. 2007 A Virus in a Fungus in a Plant: Three-Way Symbiosis Required for Thermal Tolerance Science 26: 513-515.
2. Wang Y, Addess KJ, Chen J, Geer LY, He J, He S, Lu S, Madej T, Marchler-Bauer A, Thiessen PA, Zhang N, Bryant SH. 2007. MMDB: annotating protein sequences with Entrez's 3D-structure database. Nucleic Acids Res. 2007 Jan;35(Database issue):D298-300.