Thoughts from the SEA-PHAGES symposium

What a weekend! The past two days have been filled with excellent student presentations, ample opportunities for networking and fruitful conversations about future research and teaching ideas. Chen and I presented our poster about alignment-free sequence analysis techniques applied to mycobacteriophage genomes on Saturday night. We must have done something right, because we came back to Janelia Farm this morning with a first place ribbon on our poster! Chen also gave his oral presentation this morning and absolutely knocked it out of the park – people have been coming up to us all day and asking how the animations were done.

We’re going to be putting up a web page summarizing our presentation, poster, results and methods in the next few days. For now, you can view the poster and check out our (unfinished) code at my GitHub. I’ll make another post here when everything is ready!

I was also very impressed with some of the research happening at other schools in the SEA-PHAGES program, and will be writing about some of them in the next few days. For now, check out some photos from Janelia: Continue reading

SEA-PHAGES symposium 2014

This weekend I’m down at HHMI’s Janelia Farm Research Campus at the SEA-PHAGES undergraduate research symposium. The phage hunters class I TA is administered through HHMI and is taught at over 70 schools around the US and internationally. This symposium is a chance for undergraduates from all the schools to get together, present their research and be exposed to new ideas. Chen (one of the first year students and I are presenting our research into tetranculeotide usage in mycobacteriophage genomes. We’ll have a poster at the session on Saturday night and Chen will be giving an oral presentation on Sunday morning.

Janelia Farm is an inspiring place to visit – something about the beautiful architecture coupled with cutting edge research really sticks with you. I hope to come back to Providence with new connections, ideas and inspirations.

Check out the poster we’ll be presenting and feel free to leave a comment with any questions about the research, phage hunters, or the symposium in general.

Counting tetranucleotides in mycobacteriophages

As a teaching assistant in Brown’s first year seminar “Phage Hunters” I lead several freshman biology and computer science students in an independent bioinformatics research project. We began the semester looking for evidence of CRISPR protospacers in mycobateriophage genomes. The idea was to use blast and other tools to get students introduced to the bioinformatics investigation process. We covered the basics of the CRISPR/Cas system, wrote a python script to download genome sequences from phagesdb.org, and made a local blast database on Brown’s computer cluster.

Things were going well with the project, but a few weeks in I was having doubts as to how statistically valid our protospacer predictions were. Then, I re-read a paper by one of the leaders in the field and discovered a) they had already looked for protospacers, and b) found no conclusive evidence in mycobacteriophages. The author of the paper was also going to be at the SEA-PHAGES symposium we were planning to present our class results at, so that really spelled the end of the CRISPR project. We needed  a new idea though – the course instructors were counting on the bioinformatics team to generate some research we could bring to the symposium. My solution: frantic searching on Google Scholar for anything relevant to bioinformatics and bacteriophages.

Within a few minutes I came upon a paper (1) that looked at the the usage of tetranucleotides in viral and bacterial genomes. The idea is that closely related genomes have similar signals in terms of tetranucelotide usage, and this signal can be used to look at relationships independent of alignment-based techniques. I had found a new idea for the project! This kind of analysis was also perfect for teaching bioinformatics. It introduces a lot of the concepts and language used in the field, like kmer counting and normalization. It is fairly straightforward to program, easy to apply to bacteriophage genomes and doesn’t require complicated statistics in a first level investigation.

I ran with this idea for the bioinformatics project and the results were quite exciting. We found tetranucleotide usage was well conserved within mycobacteriophage cluster (a way to group phage based on pariwise nucleotide alignment and gene content comparisons) and divergent between clusters. We built phylogenetic trees that closely corresponded to published trees, looked for horizontal gene transfer and were able to accurately cluster unknown phage – all based on the usage of 4-letter words within the genomes. For a more detailed overview of the work, check out the abstract I submitted for the International Society for Computational Biology Student Council conference.

One of the first year students, Chen Ye, and I are also going to be presenting this research at the SEA-PHAGES symposium at HHMI’s Janelia Farm this weekend. Check back for an update with our poster and other thoughts from the conference!

1. Pride, D.T., Wassenaar, T.M., Ghose, C., and Blaser, M.J. (2006). Evidence of host-virus co-evolution in tetranucleotide usage patterns of bacteriophages and eukaryotic viruses. BMC Genomics 7, 8.