How do you go about researching a genetic disease?
This multi-part series explores how digital resources can be used to learn about Huntingtin's disease. Reposted and updated from the original DigitalBio.
A bit of background
Alice's Restaurant is a movie with an unforgettable song that mostly revolves around Arlo Guthrie hanging out with his friends. Somewhere in the movie, the conversation turns to Woody, and someone asks the question that no one wants to touch. Does Arlo's girlfriend know about Huntington's? ...dead silence... Now, I did see the movie quite a few years ago, so my memory of the plot is kind of fuzzy but, as I recall, no one in the movie was prepared for that kind of discussion.
It has been a couple of decades, or so, since Alice's Restaurant was made. Woody Guthrie is long dead, but kids still sing This Land is Your Land in elementary school, and people with Huntington's disease (HD) are still without a cure.
HD is a terribly debilitating disease that strikes people in the prime of life and unfortunately, after they're likely to have had children. The disease is inherited in an autosomal dominant manner, which boils down to a 50:50 chance of getting it, if one of your parents has it. All it takes is the wrong copy of chromosome 4, with a few dozen extra nucleotides and you're SOL.
The difference between now and then, though, is that there's a genetic test that can divine your possible fate. A little bit of blood, some enzymes, a way to separate different-sized pieces of DNA, and you can find out if you better go to Disneyland while there's still time or you might want to sign up for that retirement plan that people are always telling you about.
Much of our knowledge about HD comes from work by Nancy Wexler. I was fortunate to hear Dr. Wexler talk about HD and her work with afflicted families in Venezuela. at the University of Washington, a few years back. If you'd like to hear her for yourself, NPR has an interview with Dr. Wexler that's well worth a listen.
Hunting for reviews
Okay, the disease is horrible, but learning about it is interesting. Perhaps we can even learn some general things about biology along the way.
We'll begin by learning a bit about the gene and the disease. Both GeneTests and the Genetics Home Reference at the National Library of Medicine tell us a bit about the disease symptoms, and that the difference between having HD and not having HD is a few extra CAGs in the huntingtin gene. We can even find a lab that will do a test by clicking links at these sites.
We also find out that the HD gene is also called "huntingtin" and is quite large. Huntingtin is over 200,000 bases long and has 67 exons. Plus the gene is quite polymorphic as far as the CAG repeats. Normal people have 10-35 CAGs in the huntingtin gene, where individuals with the disease can have as many as 40-55 CAG's.
Hunting for the gene
To confirm the presence of repeated CAGs, it's nice to be able to find the huntingtin gene sequence ourselves and take a look. If we go to the NCBI, choose the Gene database from the pull-down menu at the top of the page, and type "huntingtin," we get a list of genes that includes the huntingtin gene from multiple species, plus lots of genes for proteins that interact with huntingtin. So we do have to read carefully to pick the link for the right gene.
Going to the HD Gene record gives us lots more info. In the middle of this page is a picture of the introns and exons along with links to reference sequences from the contig (NC), mRNA (NM_00211), and protein (NP002102). We click NM_002111 and choose FASTA to get the DNA sequence that corresponds to the huntingtin mRNA.
Now, we see a long sequence with lots of A's, G's, C's, and T's. How are we going to find the CAG repeats without going blind staring at a computer screen? No problem, we just need a little fancy footwork with our web browser. Most, if not all, web browsers have a way to search for text on a web page. You can find one by looking through menus or use whatever key commands you normally use with Microsoft Word (Mac OS X, use Command + F, for Windows, use Ctrl + F). If you use FireFox, the search feature is really nice. Not only can you find the text, you highlight all places where it occurs. So, with Firefox, our search results look like this:
Hunting in the animal kingdom
Sure enough, a normal huntingtin gene has 19 CAGs (see above) and the disease-related mutant protein has about twice as many. This is unusual but it doesn't tell us why that would be a problem. Do the extra glutamines cause Huntington's disease or result from it?
We might be able to answer this question by looking at HD in other animals. If we search for huntingtin gene at the UCSC genome browser we can find that mice (Mus), pigs (Sus), and fish (Danio) all have the HD gene, too, with the same organization of introns and exons. The exons are the straight lines in the picture below. These are the DNA sequences that get copied into mRNA.
Since mice have the huntingtin gene, maybe mutant mice can help us the answer this question. The Jackson Labs have been able to make mice with the equivalent of HD by adding extra copies of CAG to the mouse version of the huntingtin gene.
If we go to the Jackson Labs site and search for Huntington, we find examples of mice that develop HD symptoms, when they have extra CAGs. This result tells us that the extra CAGs are enough to spur the development of HD.
Still hunting for answers
The role of the extra CAGs is still unclear but we do know that CAG codes for glutamine.
Since glutamine is able to form hydrogen bonds through the amino and keto groups (blue and red in the picture below), proteins with extra glutamines might cause problems by binding to other proteins and interfering with their normal activities.
I've been unsuccessful, so far, in finding protein structures with more than three glutamines in a row, but I did find polyglutamine tracts in lots of protein sequences and some tantalizing clues in the literature.
So, tune in next time to learn what we can find when we continue our hunting expedition and venture into the deep, deep, darkness of the digital databanks.