The modern era of biology is shooting forward on the back of computational progress. Harnessing big data is a vital part of being a competitive and skillful modern scientist. However, traditional biology curriculum provides little computational training, and it can be incredibly easy to feel far behind peers with formal computer science training. As a molecular biologist, I questioned whether I could become an adept computational scientist without the formal training!
After tackling a computational project in my first rotation of graduate school, I learned that every pipette-wielding Scientista can learn computational skills, and keep up with the computer nerds.
Here are some important tips to get you started:
2) Choose between being a biologist with computational skills versus a computational biologist: Both types of scientist are extremely necessary for the field of biology! Computational biologists often have computer science background, and will be generating advanced algorithms for novel computational research. However, being a biologist with everyday computational skills in your repertoire is incredibly valuable. You don’t need formal training and can write your own messy but effective code to address data you generate yourself. Wherever you choose to fall on the spectrum is up to you!
3) Decide what languages you’ll learn:
Unix Command Line/ Bash: Learn it! This is the interface on your computer’s ‘terminal,’ which allows you to manipulate, view, and move files, plus run programs.
Python: Learn it! It’s an intuitive and straightforward general-purpose language that is also incredibly versatile, with many pre-made Python-specific tools to manipulate and analyze big biological data sets.
Perl: Probably don’t need it! This can be an alternative to Python in terms of purpose, but it’s not as intuitive or versatile. Although useful in very specific contexts (certain genomics tools require Perl, for example), it’s not the most efficient way to maximize your time.
R: Learn it! R is a specialized scientific computing language for statistics and data visualization. A huge number of tools have been written for biological analysis in R, especially for use in genomics and statistics.
Matlab: Probably don’t need it! This language is similar to R, but not as well designed for statistics and the graphs aren’t as pretty. This is popular in machine learning work and engineering.
C, C++, Java: Learn it later! Not as appropriate for daily analysis, but used to make heavy lifting algorithms or tools. These languages are important for computationally intensive and advanced programming, but aren’t the best place to start.
4) Instructed vs mentored learning: The most efficient way to learn programming is a combination of the two. Use your own reasoning and online resources in combination with a willing and trusted mentor. First, teach yourself the basic jargon, syntax, and logic of programming by taking an online tutorial. Code Academy is my favorite Python tutorial, and whereas R has some okay basic introductions, most R-related questions can be answered through the magic of Google.
However, independent resources will sometimes fail you. If so, first try once more to figure it out yourself. These breakthroughs can be the most satisfying educational experiences! If this doesn’t work, ask your mentor. You will become a better programmer if you have someone more experienced to whom you can ask hard questions. Many of my programming problems were solved by solutions so random, the only way to know it was to ask someone who’d been there before! If you can’t find an online answer to your computational question, your mentor may have a quick solution.
5) Jump right in: Implement these skills in a project in your lab! Ask if there is any basic data analysis to perform, and in your free time, begin applying it. I asked many people what I could do to develop my computational skills, and all of them said to work on a computational project. Be sure to ask for something basic for starters, and work your way up from there.
6) Remember when you first started benchwork?: Whether two or ten years ago, try to recall how overwhelming it was. You didn’t know how to hold a pipette, let alone perform a Western blot. All the jargon, organization, ways of thinking - completely new and intimidating. But you conquered wetlab skills, and you will conquer computational skills! You once learned the difference between an eppendorf and a falcon tube – now you’re learning the difference between a directory and a file.
Good luck – you’ll be a computational Scientista before you know it!
Molly Gasperini is a first year graduate student in the Department of Genome Sciences at the University of Washington. After four years working in traditional human genetics, she hopes to become a molecular biologist with computational skills in genomics. Her favorite computational textbook is the O’Reilly R Graphics Cookbook because there is a reindeer on the cover.