Wednesday, December 19, 2007 - 01:13
In which we're reminded that database searches are experiments, too.
One of the trickiest things with bioinformatics experiments is repeating them. This challenge isn't related to the validity of the original results, the challenge is that, unless you made your own database and kept it in the same state, the database that you'll be using at a later time, sometimes even a day later, is a different database. And, if you query a different database, you may get a different result.
The series that I'm currently posting is one that I started working on a couple of years ago. Originally, I was going to repost these stories as is, but it seemed best to add another twist and see if I could reproduce some of the results, or at least find out which results have changed. In the next few posts, you'll see the results of those experiments.
Playing catch-up with the latecomers
Hi, for those of you who've just joined us, we've gotten lost in some databases while hunting for information on huntingtin. If you'd like to catch up a bit and come back later, you might want to read Hunting for huntingtin (part I).
If not, here's a brief synopsis of the plot and what we've done so far:
Looking for other structures
Okay, so what can I do now? What would you do?
I decided to do a blastp search, since NCBI has this cool new feature where protein sequences, with a corresponding structure, are linked to the structure record in the MMDB.
So I used blastp to search the human protein database with a sequence of 15 glutamines.
What did I find?
In 2005, my search gave this result: No significant similarity found.
This year, I got results.
But they're strange.
I have some perfect matches to things that I've never heard of like Vanderwaltozyma polyspora, Brugia malayi, and some things that I have heard of like Anopheles gambiae (some type of mosquito) and Chlamydomonas.
Where are the human proteins?
Right. I said these experiments are hard to repeat.
See ya next time. We'll try to muddle through the mystery and get back on track with the story.
- learned about Woody Guthrie and Nancy Wexler
- found a couple of reviews describing Huntington Disease
- got the HD gene sequence and counted the number of CAGs
- we learned the CAG codes for glutamine and that glutamine can form hydrogen bonds

