I often get questions about bioinformatics, bioinformatics jobs and career paths.
Most of the questions reflect a general sense of confusion between creating bioinformatics resources and using them. Bioinformatics is unique in this sense. No one confuses writing a software package like Photoshop with being a photographer, yet for some odd reason, people seem to expect this of biologists. In the same respect, even the programmers and database administrators who work in bioinformatics, are unfairly assumed to have had graduate level training in biology.
In many ways, it's easiest to understand what bioinformatics is, and to choose a bioinformatics-related career, by dividing the field's participants into two groups: the tool builders and the tool users. The tool builders are the programmers, architects, computational biologists, and computer scientists who write new algorithms, create databases, and build software systems. The tool users are the biologists.
As far as careers go, job descriptions that include "bioinformaticist" or "bioinformatics programmer" usually apply to the tool builders. The jobs where people use bioinformatics are more biology-related. A wider variety of careers use bioinformatics resources, but that term won't appear in the job title and the people using the tools might not even know they're using bioinformatics - especially in the case of databases like PubMed.
The kind of bioinformatics I teach is directed towards the tool users, either technicians, wet-bench biologists, or like me, biologists who've gone digital. I teach instructors and students how to use the tools. In my classes, we use bioinformatics resources to learn about biology. Nevertheless, I would like to have good answers for the future bioinformaticists and instructors who ask me what kinds of languages and subjects to study or teach.
Working for a few years in a software company taught me something about the activities in both the builder and user camps, but it's nice to have a more detailed and comprehensive reference to cite. That's why I was really happy to read this article in PLoS Computational Biology: "A Quick Guide for Developing Bioinformatics Programming Skills." by Joel Dudley and Atul Butte (1). This is the article I will recommend to students on the tool-building path and instructors who wish to help them.
My favorite parts were the sections on UNIX skills (I love UNIX!), structuring data, and on valuing your time. My only complaints are minor. I thought the comment on SQL statements being peculiar was puzzling. It would also have been nice to see some discussion of HDF5 and BioHDF. This topic would have fit well in both the structuring data and valuing your time sections. BioHDF supports rapid development because it has a hierarchical data model, binary file format, and collection of APIs (2). (BioHDF is an open-source collaboration between The HDF group and Geospiza. You can read more about it in Advances in Computational Biology, part of the book series Advances in Experimental Medicine and Biology, AEMB, published by Springer). The best part of the article, though, is the authors get it. They understand what biologists want. Quoting from the PLoS article:
The success of bioinformatics software is based not on the elegance of the software design, but rather its utility as a tool for driving and answering biological questions. Consequently it is no surprise that many successful bioinformatics apps are written by biologists who lack formal computer science training, as they undoubtedly put scientific utility ahead of architectural elegance and completeness.
This is an important point for aspiring bioinformaticists to remember.
1. Dudley, J., & Butte, A. (2009). A Quick Guide for Developing Effective Bioinformatics Programming Skills PLoS Computational Biology, 5 (12) DOI: 10.1371/journal.pcbi.1000589
2. Mason, C. et. al. Standardizing the Next Generation of Bioinformatics Software Development With BioHDF (HDF5) in Advances in Computational Biology, Springer (in press).