In which we're reminded that database searches are experiments, too.
One of the trickiest things with bioinformatics experiments is repeating them. This challenge isn't related to the validity of the original results, the challenge is that, unless you made your own database and kept it in the same state, the database that you'll be using at a later time, sometimes even a day later, is a different database. And, if you query a different database, you may get a different result.
The series that I'm currently posting is one that I started working on a couple of years ago. Originally, I was going to repost these stories as is, but it seemed best to add another twist and see if I could reproduce some of the results, or at least find out which results have changed. In the next few posts, you'll see the results of those experiments.
Playing catch-up with the latecomers
Hi, for those of you who've just joined us, we've gotten lost in some databases while hunting for information on huntingtin. If you'd like to catch up a bit and come back later, you might want to read Hunting for huntingtin (part I).
If not, here's a brief synopsis of the plot and what we've done so far:
- learned about Woody Guthrie and Nancy Wexler
- found a couple of reviews describing Huntington Disease
- got the HD gene sequence and counted the number of CAGs
- we learned the CAG codes for glutamine and that glutamine can form hydrogen bonds
Then, we got curious about those extra CAGs and wanted to know if they result from the disease or cause the disease. So we looked up huntingtin at the UCSC genome browser and saw that there are similar genes in mouse, pigs, and zebra fish (plus a few other members of the animal kingdom that were not discussed).
Since mice have a similar gene - and we know that the Jackson Lab is the place to go for all things mouse - sure enough, the Jackson mouse breeders have made mice with extra CAGs, and .....the mice get the symptoms of HD.
So you guessed it, the extra CAGS are the problem, not the result.
As the fearless leader of this expedition, I vote now we look at those extra CAGs a little more closely.
Searching for the lost glutamines
You might remember, in part I, I mentioned looking for 3-D structures with polyglutamine. I did find one structure with a polyglutamine sequence, but it looked like the crystallographers weren't able to resolve the part in the structure where the glutamines were supposed to be. Cn3D shows the missing glutamines in grey in the sequence window. The structure window shows this:Looking for other structures
Okay, so what can I do now? What would you do?
I decided to do a blastp search, since NCBI has this cool new feature where protein sequences, with a corresponding structure, are linked to the structure record in the MMDB.
So I used blastp to search the human protein database with a sequence of 15 glutamines.
What did I find?
In 2005, my search gave this result: No significant similarity found.
This year, I got results.
But they're strange.
I have some perfect matches to things that I've never heard of like Vanderwaltozyma polyspora, Brugia malayi, and some things that I have heard of like Anopheles gambiae (some type of mosquito) and Chlamydomonas.
Where are the human proteins?
Right. I said these experiments are hard to repeat.
See ya next time. We'll try to muddle through the mystery and get back on track with the story.