Welcome Bio256 students!
This quarter, we're going to do some very cool things. We are going to use bioinformatics resources and tools to investigate some biological questions. My goal, is for you to remember that these resources exist and hopefully, be able to use them when you're out working in the biotech world. I don't believe that bioinformatics is a subject that you can really grasp without getting your fingers dirty. So, this course will include a lot of hands-on work.
My friend and collaborator at Johns Hopkins University has given me data sets from the past three years and we are going to use bioinformatics tools to find out which bacteria live on her campus and where they live.
Before we begin working with the JHU data though, we will discuss what will be covered during the course.
In our first class, we will talk about DNA sequencing, the data, where it comes from, and how we measure the quality of that data. I have uploaded all of the JHU data into the iFinch account for our course so that we can use these data to illustrate data quality and what it means for doing good science.
After class, use the reading assignment below as a reference to help in understanding quality values produced by phred. Other base calling programs, such as KB, produce similar quality values. We will use the KB and phred quality values interchangeably during this course.
We will also start working with blastn tonight to learn how we can use it as a tool to identify unknown sequences.
1. The mysteries of DNA sequencing
DNA sequencing animations (at DNAi)
2. Quality values
3. Using blastn to identify sequences
see the assignment below
Here are your first two assignments (remember, I said this is pretty hands-on):
Assignment 1. Quality values and DNA sequencing
I'll assign the chromatograms and we'll work on this during class.
1. What does a quality value of 20 mean? What about a quality value of 30? of 40?
2. If you have 340 Q20 bases, what does that mean?
3. If you look at Q20/length, you see different values.
a. Would a value of 0.34 correspond to a high quality sequence or a low quality sequence?
b. What Q20/length value would you have if all the bases in your sequence were of high quality?
4. What kind of quality score would you see, if you were looking at a sample of human DNA, and the human DNA sequence contained a SNP at the position you were viewing?
5. Imagine that you're an MD and you have a patient who has just been tested for their ability to metabolize Coumadin. You decided to use DNA sequencing to look at a specific gene and determine the proper dose for this person since you know that the ability to metabolize this drug is genetic. It's important to know if the lab was right or not, since the wrong dose of this drug can lead to death. (If you're interested, you can learn more about the genetics of Coumadin metabolism from this video of Debra Nickerson talking about the genetics of drug response.)
Imagine now, that the lab sent you a result like this: AGATACTAGCATCATCAGAT
and let's say we knew the sequence came from the correct gene. (I just made this sequence up, so this sequence would not.)
What other data would help us determine if this sequence was likely to be correct?
Assignment 2 Copy one of the unknown sequences from the BLAST for beginners activity, and use it as query sequence for a blastn search.
Here are your tools:
1. The unknown sequences: BLASTing through the kingdom of life. You'll be assigned a sequence tonight in class.
2. An animated tutorial showing how to do and interpret some of the results from a blastn search. BLAST for beginners.
3. A PDF document with the questions that you need to answer and an example.
4. Another set of solved examples, with more pictures.