Tuesday, July 31, 2007 - 04:07
I don't usually blog about work for wide variety of reasons. But, last week, since I wanted to write about bioinformatics software companies, I broke with tradition and wrote about Geospiza as an example. Naturally, I got some feedback about this. Some people liked it, but one of the most opinionated people said that I had given the software engineering and IT side short shrift and that I should write about that side a bit more. Today, is my attempt at a remedy.
tags: ,The tip of the iceberg This diagram was given to me to use as an example of the engineering and IT areas in bioinformatics companies.
Lots of people think that bioinformatics only concerns the algorithms, like Smith-Waterman, FASTA, phrap, or BLAST. Or they think it's the visual reports that you get from web servers like the UCSC genome browser, or the web forms where you request data from GenBank, or the databases like the PDB, or the graphical interfaces (GUIs) that you use to work with your data - like with FinchTV. But, that's only the tip. There's much, much more going on below the surface. And of course, this is where I get into deep water. Don't blame me, I'm not a native speaker Bear in mind, I'm a biologist, so when it comes to writing about IT, I will certainly get a few things wrong. Working in a software company, I continually encounter worlds that I never knew existed, and things that make me feel ignorant, so you can all feel free to correct me in the comments section if I make a mistake with some of the technology, terminology or acronyms. Take databases, for example. I've decided that databases are probably at least as different from each other as different species of animals. You think they'll all behave the same, but no. Some don't like capital letters or spaces; some have different limits on the amount of text that can be entered. Even the language, that you use to ask the database questions, has different dialects. Some databases like one version of SQL (structured query language), other databases use SQL with slight variations. (And that's only when they're ANSI compliant.) Even a single flavor of database will behave differently when it holds different amounts of data, or lives on a different operating system. And then there are things like vacuuming and tuning. Databases need to vacuumed? Who knew? A very brief description of the IT & engineering side Even buying computers isn't as simple as you might imagine. Servers are not like laptops. You have to make sure that the processors are up to the tasks that you have in mind. There has to be sufficient RAM, hard drive space and back-up systems. (Hi! I'm Linux and I'll be your server today. Would you like that RAID 0, 1, or 5?) So, some of the people who work at our company are very focused on all the subtleties of shopping for equipment and knowing which types are compatible with which databases. We also have experts in databases, not just the knowledge architecture, but the methods for backing up information, checking integrity, measuring performance, tuning queries, upgrading systems, and things that to me are as comprehensible as metaphysics. All the variety, of course, with different kinds and versions of databases, different operating systems, and different versions of our own software would leave us with an incomprehensible mess if we didn't have methods for tracking everything, version control, building software every night, documenting what's been done, API's and application frameworks. The people who work testing our systems and software have to be very organized, creative, and methodical to make sure that most important features have been tested with multiple databases, multiple operating systems, and multiple web browsers. One of commenters in a previous post mentioned a tendency to view programming/software engineering as a skill akin to "advanced typing." Nothing could be further from the truth. Certainly, we have people involved in designing software who have lots of lab experience and sometimes master's degrees or Ph.D.'s in biology. That background is essential when it comes to designing software that will be useful for working with biological data. But, that's only a small part of the iceberg. I almost forgot - every level of that iceberg requires some kind of documentation, written for it's own specialized audience. We hire software engineers, database specialists, software architects, programmers, technical writers, and others, because they have specialized technical knowledge in different areas. It takes expertise in many areas to create software that will withstand the test of time. Read the whole series:
- Part I. Careers in biotechnology
- Part II: Bioinformatics
- Part III: Life in a bioinformatics software company
- Part IV: The tip of the informatics iceberg
A look at the jobs in biotech company, making biomedical products.
Where does bioinformatics fit into a biotech company? Who makes bioinformatics tools? Who uses them?
How do people work together to make bioinformatics software?
What about the software engineering and IT side of bioinformatics software companies?