More flu follies: comparing sequences and making trees, activity 4

<< Return to the Archive

Share to: 
Sandra Porter

What tells us that this new form of H1N1 is swine flu and not regular old human flu or avian flu?

If we had a lab, we might use antibodies, but when you're a digital biologist, you use a computer.

Activity 4. Picking influenza sequences and comparing them with phylogenetic trees

We can get the genome sequences, piece by piece, as I described in earlier, but the NCBI has other tools that are useful, too.

The Influenza Virus Resource will let us pick sequences, align them, and make trees so we can quickly compare the sequences to each other.

This is how I got the sequences that I wrote about yesterday. I think the more people we have looking at sequences, the better off we are.

I'll show you how this works by getting and comparing sequences from the hemagglutinin (HA) protein from the recent cases of H1N1 swine flu and comparing those sequences to the HA protein from other cases of H1N1 swine flu that happened last year.

1. Go to the NCBI Influenza Virus Resource (this will open a new window).

2. Start out by getting the sequences from the recent swine flu cases in California and Texas.

To do this, we will pick Influenza A as the virus species, human as the host, North America as the region, and HA as the segment. Protein sequences are selected by default and those are just fine.

Then, we set the date range from 2009, 03, 01, to 2009, 04, 29.

Last, we click the Add to Query Builder button to get the sequences.

I forgot to put this in the image, but I also used a filter to select for H1.  I typed "H1" in the really long text box.  Also, note, I was looking at the protein sequences.  (We should look at nucleotides, too, but that's a later experiment.)

i-f7453935b883c2e49dfc251cb00b77bc-flu_query1.png

3. This query finds 7 sequences. If we click the Get Sequences button, we can see that that these are the California and Texas isolates.

i-d849cd6aec88770cc6ab41203ca2b17b-h1n1seqs.png

Now, we have to decide which groups we'd like to compare. I decided to compare these to other H1N1 flu sequences and to some sequences from pigs.

4. To get other flu sequences for comparison, I used the same queries (1-2) with some changes. 

       a.  For one set of sequences, I changed the host to "Swine."

       b. For the other set of sequences, I changed the date range so that I could get older sequences.

       c.  Each time I changed the settings, I clicked the Add to Query Builder button.

Now, the Query Builder contains the H1 sequences from the seven US cases, 272 sequences from people who've been infected with H1N1 over the past year in North America, and 5 H1 sequences from pigs.

i-b4ecac7b9e4d3994a71d6efa692eec39-query2.png

5. Then, I click the Get Sequences button.

This gives me a long list with far more sequences than I want to use. I click the check box at the top to deselect everything, then I use the check boxes to select the sequences I want to compare.

I sorted by year to make my 2009 cases easier to find. Then, it's time to decide which sequences to pick.

Hmmm, of course I picked the seven swine flu cases, then I picked some sequences that were isolated from actual swine, then some other human cases of H1N1 that happened in different parts of North America last year.

At this point, I could download sequences and work on my own computer or I can use some of the analysis tools at the NCBI. I decided to let the NCBI's computers do the work, so I clicked the Multiple Alignment button to see the amino acid similarities, then, I clicked the Build a tree button, and a lot of Next step buttons.

Here's my tree:

i-05304dd2059fdc1b998a8dca2239c66e-big_tree.png

After making the tree, I decided to look at all the sequences in my set. Here's what I get from that analysis:

i-883e5784439094fb8f211d747ba5fab8-tree2_small.png
View the full-size image

What do I conclude from this? Well, first, it looks reasonable to say that the people in Texas and California were probably infected with the same strain since those sequences cluster pretty closely together.

Second, it looks like the HA protein from the California and Texas strains is most similar to the HA protein from a strain that infected some pigs in Ohio a couple of years ago and it is not as closely related to the 200 some strains of H1N1 that infected other people in 2008.

You guys can play amateur epidemiologist, too, and look at other strains or look at the New York strains.  I think the more eyes we have looking at these, the better off we are. 

Nucleotide sequences should be looked at and other tree methods would be good to try as well.  And, of course as if things weren't complicated enough, there are 8 different segments of the flu genome.

Have fun!