ISCB Student Council Symposium 2017

Each year, the International Society for Computational Biology Student Council (ISCB-SC) organizes a conference for students and early career scientists in computational biology. The Student Council Symposium (SCS) is typically the day before the Intelligent Systems for Molecular Biology conference and welcomes scientists from all over the world. As one of the organizers of SCS this year, I had to be in Prague to administer the conference and deal with last-minute . Check out these links if you’re interested in the Student Council (twitter), or want to read some writing I’ve done on planning an international conference in the past.

We had 3 excellent keynotes this year:

  • Dr. Christine Orengo, professor at UCL and protein structure expert. Dr. Orengo gave an overview of her research, but spent most of the time speaking about advice for young scientists. A major point she stressed was to carve out your own niche in the research world. Find an area that combines what you’re good at, what interests you, and where the field isn’t too crowded. There, you can maximize the impact of your work and can be the most successful without excessive competition. Dr. Orengo also spoke about how important good relationships with your competitors are. I took this away from the lecture, “keep your collaborators close, but keep your competitors closer.” Not to compare scientific research to The Godfather, but the point was that you should treat your competitors well. You can learn from them, be motivated by them, and might even end up joining forces in the end!
  • Dr. Johannes Söeding, professor at the Max Planck Institute for Biophysical Chemistry and another protein sequence, structure and homology expert. His lecture was more focused on research his team had been doing. Quite successfully, I might add, as we had two of his students presenting at SCS!
  • Fiona Nielsen, founder and CEO of Reopositive. Fiona talked about her transition from academia to private industry. Deep in the research process, she found it almost impossible to identify or access datasets that would support her project. It’s a problem I’ve seen over and over again in my own research: there are many databases of genotyping or gene expression data, each with their own datasets, formats and access rules. After identifying the data you need, it can take months to be approved if the dataset has restricted access (for patient sensitive information or germline mutation status). Majorly frustrated by these repetitive roadblocks, Fiona was driven to solve this problem. She first established a charity (DNAdigest) and then a company (Repositve). It was interesting to hear Fiona’s take on this winding career path, and helpful to be reminded that pure research isn’t the only path that can have a positive impact on patients.

Another highlight of this year was the flash presentations: 5 minutes, 2 slides and 1 chance to sell your work to the crowd. Flash presentations gave many more people a chance to speak – we had 12 this year. I was worried that keeping people to the 5 minute time limit would be difficult, but everyone stayed on track and they were a big success. We’ll definitely have more flash presentations at future SCS.

This was also the largest SCS I’ve ever attended or organized — we had 75 poster presentations and even more people registered for the symposium! I enjoyed helping to organize an event that brought people from such diverse and far-reaching backgrounds together. A lot of time and effort goes into SCS, but it’s rewarding and worth it in the end.

Finally, we moved the crowd to a nearby restaurant and bar for the “networking event,” which is a chance to let off some steam and enjoy a good meal while avoiding the topic of research entirely. It was great fun, even if Bart did hog all the beer!

A huge thanks to the other SCS organizers, it takes a big team effort to pull off an event like this: Julien Fumey, Mehedi Hassan, Bart Cuypers, Aishwarya Alex Namasivayam, Nazeefa Fatima, Alexander Monzon, Farzana Rahman, Sayane Shome, Dan DeBlasio, R. Gonzalo Parra and Alex Salazar all made invaluable contributions and were a pleasure to work with.

ISCB student council 2014

I submitted some research I’ve been working on (as a byproduct of TAing a first year seminar and leading some students in an independent bioinformatics project) to the International Society for Computational Biology student council symposium. Yesterday I found out it was selected for an oral presentation! This is the first chance I’ve had to present independent research, so needless to say I’m pretty excited.

The talk is titled “Tetranucleotide usage in mycobacteriophage genomes: alignment-free methods to cluster phage and infer evolutionary relationships” Read on for the full abstract.

Continue reading

k-mers are everywhere!

Many problems in bioinformatics involve working with short pieces of DNA sequence. We call these short words k-mers, where k is an integer usually less than 30 or so. A k-mer is essentially a substring of a larger sequence of DNA. If you’re  a biologist you may be wondering why people could be interested in anything other than 3-mers, the codons that encode amino acids. As it turns out, k-mers are at the center of many bioinformatics techniques and are the subject of intense algorithms research.

Some bioinformatics areas where k-mers play a central role:

  • Genome Assembly. Assemblers based on the overlap-consensus model (such as Celera) or De Bruijn Graphs (like Velvet) use k-mers to build the initial data structure for genome assembly. As overlaps between k-mers are found, the assembled sequence grows!
  • Sequence Alignment. The Basic Local Alignment Search Tool, or BLAST, is arguably the most well-known product of the bioinformatics field. BLAST can find DNA sequences conserved between organisms, uncover horizontal gene transfer and explain why we can’t make a vaccine for the common cold. And it all depends on the initial matching of short k-mers from the search sequence to the database.
  • Sequencing Quality Control. Overrepresentation of k-mers in a next gen sequencing library can be diagnostic for errors and duplications. The fastqc program computes the usage of 5-mers in sequencing reads as a form of quality control.
  • Alignment-Free sequence Analysis. My new favorite problem! Expect a post on this soon. Basically, the usage of short k-mers in a genome can be used to infer evolutionary relationships and examine horizontal gene transfer. Kind of like GC content but with more signal.
  • Codons and Repetitive Regions. Codons, the 3-letter sequences that encode for amino acids that build proteins, are essentially 3-mers with special biological function. 3-mers are also important in disease, such as the CAG repeats that cause Huntington’s disease.

K-mers are everywhere in bioinformatics. There is a lot of work into ways to efficiently (computational time and memory) count k-mers in large genomes. Really impressive and cool algorithms have been developed to solve the k-mer counting problem, some of which I’ll be talking about in a later post. It turns out these little words of DNA are important after all!