Week 6: So this gene walks into a bar and says BLAST me
Comparative genomics relies heavily on methods to compare genes from one species to another. A popular tool used to compare genes or protein is BLAST. Over the summer, I have been using BLAST to compare cancer-related proteins in the Naked Mole Rat to proteins with similar function and sequence in human, mouse and dog. When I present my work to other students and faculty, I am often met with a resounding, “What is BLAST”?
The first time I was asked that question, I thought the student was from a foreign planet, because who hasn’t heard of BLAST?! However, as I was asked the question with increasing frequency I realized that many people have heard of BLAST yet don’t understand how it works. Here, I will attempt to outline the basic concepts of BLAST. In addition, I will give examples of how BLAST has changed our daily lives.
I think of BLAST as a little perfectionist driven elf with a PhD in mathematical modeling. When I submit a sequence, he takes it and quickly skim reads it to find short matches between the two sequences and uses this information to identify the sequence as belonging to a specific family. Once he identifies my sequence’s family he pulls out a box containing all of the sequences in that family and begins to compare my sequence to the sequences present in the box.
The BLAST algorithm operates on a matrix- which is a simple but complex generalization of differentials. It aligns my sequence with any individual sequence from the box and assigns points for each residue (nucleotide or amino acid) that is similar. Extra points are given for stretches of residues that align. The BLAST elf gets really excited when long stretches from conserved regions align.
The elf really likes aligning things as threes, he initially searches for three residues from the conserved family sequences that align. These resides act as the starting point from which he extends the alignment in both directions by groups of threes. As I mentioned early, the BLAST elf is a perfectionist, he wants a perfect alignment and anything that doesn’t align is given a penalty. He faults any group of three residues that doesn’t align. His penalties generate values that indicate the degree of similarity between your sequence and the sequences present in the box. The significance of these scores is evaluated relative to the sequence similarity scores of the other sequences present in the box.
BLAST greatly expedites the time to understand what a novel gene or protein does. When BLAST was published in 1990, it was the year’s most cited science paper. BLAST has been used to identify genes in species such as mouse and dog. By taking a dog gene and BLAST-ing it against the human genome-you can identify a gene highly similar between the two species. Comprehensive phylogenetic trees based on a population of species relatedness can be constructed using BLAST for further analysis. Similarly, unknown genes can be assigned a function based on their BLAST hit. Unknown genes can be mapped to their position on a chromosome also based on a BLAST search. Genes or proteins shared between two or more
species can be compared to identify conserved domains of that gene family.
BLAST is a great tool to use and explaining it is much more difficult than using it. The image of an elf trying to align my sequence with another based on his obsession with multiples of three has helped me visualize the process.
About the Blogger
Natalie Punt caught the science bug at her grammar school’s science fair . She followed her passion for model biological systems as a microbiology major at UC Davis. Natalie has
developed and characterized a mouse model of tumor angiogenesis, engineered a 3-D model of tumor angiogenesis and most recently contributed to the characterization of the epigenetic regulation of Mixed Lineage Leukemia. Her work has earned recognition at numerous science conferences and publications in peer-reviewed journals. This summer Natalie submitted a grant proposal to Merial/Merck and successfully received funding to identify proteins responsible for the anti-cancer phenotype in animals by comparing the cancer associated proteins to similar proteins found in humans. In her free time she enjoys running around the Charles, reading design books and learning to speak Italian.