In my past three posts I talked about the highlights of the 2014 Intelligent Systems for Molecular Biology conference in Boston, MA. In addition to all the academic talks and events, there were a few industry presentations that stood out. I was pleased with the industry presence at the conference. As a student potentially looking for an industry job after I graduate, I enjoyed the chance to talk with some potential employers and see what kinds of positions are available for people with a bachelors degree in comp bio.
Good news: every company I talked with seemed willing to hire a programmer or data scientist with a bachelors degree. These positions typically weren’t advertised on their websites, so I get the feeling it takes some networking to actually get hired. It was definitely an encouraging experience, though!
A few of the industry partners gave presentations during the workshop sessions at ISMB. I attended two interesting talks:
Appistry and the “pipeline challenge”
Appistry (St. Louis, MO) develops high performance computing solutions and software for genomic analysis. Have you heard of the Genome Analysis ToolKit, the software developed by the Broad Institute for variant discovery and genotyping next generation sequencing data? Well, the Broad chose Appistry as the commercial partner for the GATK.
The speaker first highlighted Ayrris, Appistry’s high performance computing platform. I didn’t get the technical details, but it sounds like Ayrris has built-in support for troublshooting genomics pipelines (something I spend so much of my time doing).
He then talked about the Pipeline Challenge. Appistry is sponsoring a contest for the best genomic analysis pipeline ideas. The winner will receive $70k in bioinformatics software and computer hardware. The Neretti lab has developed a few pipelines and ideas that would fit this contest well… I’m going to look into submitting one! Perhaps the new work we’ve been doing on the human “virome” and its role in cancer and disease?
Seven Bridges Genomics on bioinformatic reproducibility
Seven Bridges Genomics (Cambridge, MA) also develops software and pipelines for bioinformatic analysis. The focus of their presentation wasn’t on the actual pipelines, though, but rather methods and software they’re developing to increase reproducibility in genomics analysis. The speaker made a good point early on: publications often cite an image of a software pipeline in the methods section. When other researchers try to replicate the results, either with their own software or with the code published along with the paper, the analyses often don’t line up (and sometimes fail entirely). This is a huge problem in computational biology and bioinformatics – Titus Brown frequently blogs about reproducibility and most of the BOSC Special Interest Group focused on it as well.
Seven Bridges is proposing a solution based on their software platform Rabix. They plan to use docker images to distribute the software as well as any dependencies used to do analysis in a publication. Docker is a lightweight way to distribute software and ensure it will run in any software environment – an alternative to bulky virtual machines that are sometimes published in an effort to distribute code. According to Seven Bridges, “With Rabix, data, tools and pipelines can be published in open repositories which will enable the community to both host and reuse them on their own infrastructure. This way, we can share the analysis itself, show instead of tell, and create reproducible building blocks to further research.”