Road Trip: Backcountry in the Badlands 2

I woke up to sunrise in the Badlands. The landscape here is amazing – rolling prairie fields, towering mesas and sedimentary mountain peaks surround you. I spent a while looking around in the early morning light.

2015-06-02 05.35.10 2015-06-02 05.34.55

That second photo is of a bison surprisingly close to our camp. Definitely not the last one we would see today. After breakfast, we spent a while with the map and compass trying to figure out where we had wandered to camp last night. “Is it declination West, compass best? Or the other way around?”

2015-06-02 08.27.22

We planned a route to explore the landscape and set off into the desert. We trekked through canyons, explored cliffs and navigated through river beds for most of the day.

2015-06-02 09.07.50 2015-06-02 09.18.32
(Angela, are you texting in the backcountry?!)

At one point, we climbed a small hill and came upon a bison a little too close for comfort. We froze and watched as he stared back. Soon, the two bison (one was hidden behind the rocks) charged off to our right. These creatures can weigh almost two tons. It was an exciting, slightly terrifying experience watching the raw power they possess.

2015-06-02 08.58.46

By 1pm, the sun and hot temperatures were almost too much to bare, so we headed back to camp for some shade and lunch. Much to our surprise, there was a bison hanging out by our camp! When he didn’t move after an hour, I sneakily grabbed our tents and we found a new spot to spend the night. Much of the rest of the afternoon was spent lounging around, reading, and napping – the sun was just too hot.

A small storm rolled in after dinner, but passed quickly. If you look carefully, there’s a bison at the end of the rainbow!

2015-06-02 19.30.43

 

I fell asleep at 9, but by 10:30 another storm was raging outside. I spent the next hour and half sitting in my tent, holding the poles up so the wind didn’t blow it on top of me. Outside, lightning raged and the wind howled. Angela’s tent eventually collapsed, forcing us to relocate to a more sheltered spot. Morning finally came, and we hiked out to the cars.

Badlands was the most inhospitable environment I’ve ever stayed in. Simply maintaining a camp there was a constant battle against
1) High temperatures and constant sun. It’s impossible to get anything done in the afternoon hours, and there aren’t any shady spots you can retire to.
2) Exposure to storms and lightning. It’s hard to find a camp that’s protected from wind and lightning. Staying our there through a storm was honestly frightening.
3) No potable water. There are some rivers in the territory, but the water isn’t safe to drink even after filtering or treating. This means you have to bring a ton of water (see point 1), and even so, I was definitely dehydrated by the time we left.
4) Bison. They will charge and gore you if you’re not careful, and it’s easy to surprise them in this landscape.
5) Mosquitoes. They make everything about just existing in this environment more difficult. By the time I left, the back of my legs were covered in a hundred bites each – no kidding.

I learned a lot about staying in harsh environments after two nights in the Badlands. I was definitely not sad when we got back to the car and on the road again, though!

 

 

Road Trip: Backcountry in the Badlands

We arrived at Badlands National Park later in the day on Monday. We planned to spend Monday and Tuesday night out in the backcountry, and stopped by the ranger station to ask for advice on where to stay. Surprisingly, they had little to recommend beyond “well, there’s no trails, so go wherever you want and climb whatever you want.”

Oh, and this great addition: “We haven’t had a rescue yet this season… so if you guys wanted to give us a little action… we would’t mind.”

Badlands National Park: the land of extreme heat, no water, no trails, frequent thunderstorms, massive bison, and rangers who want a reason to go out for a rescue. We were going out for two nights in an area few people backpack in. Great. 

Not deterred by the ranger’s words or reports of thunderstorms rolling in, we drove through the park to find a spot to explore. We parked at “prairie dog town” and got our gear together with the backdrop of a beautiful sunset and rainstorm off in the distance.

2015-06-01 20.08.09

 

We hiked about a mile to find a suitable spot to camp, trying to balance shelter from the wind, protection form the incoming storm and distance from the mosquitoes.

That’s right, mosquitoes. Let me tell you, the mosquitoes in the Badlands are worse than any other spot on earth. I realized this soon after setting off in to the prairie – I was able to kill five of the buggers with a single slap at the back of my legs. Bug spray didn’t help, and the strong wind was only a slight deterrent. I soon suited up in full pants and rain gear, even though the temperature was still over 70.

Backcountry meal: Annie’s mac’n cheese with homemade canned salmon (delicious)

Lessons learned:  1) “witching hour,” where mosquitoes come out the worst, is a real thing
2) Cacti will stab you if you’re not careful
3) You can get a lot of reading done if your tent is the only suitable place to hang out

Road Trip: Michigan to the Badlands

The next part of the trip had a ton of driving. It was a 9 hour trip from Michigan to my friend’s house in Ames, Iowa where we spent Sunday night. Thanks, Andrew, for the cookout, great beer and letting us spend the night.

We left early Monday morning and had an 8 hour drive to Badlands National Park in South Dakota. This felt like the longest stretch so far – it’s hard to describe how flat the terrain is here. Long, straight stretches of I-90 necessitated loud country music (something about the landscape just makes country feel right, okay?!) and some road-selfies.

2015-06-01 15.28.20

Listing to the radio in this part of the country was an interesting experience. All the ads are about farming: “Need a new grain elevator? Choose my company for the best quality and price!” There were also listings of the daily prices for grains and wheat.

Miles driven past two days: 1200

Roadside yoga sessions: 2

Route followed:

Screenshot from 2015-06-06 15:02:35

Day 2: Rockwood, NY to Lake Orion, MI (and Niagara Falls)

Early start this morning! Which would have been great, had I not forgotten a part of my Aeropress coffee maker at home… If you know me, you know I need coffee in the morning. And not just any coffee – the more labor that went into the cup, the better.

What do you do when you forget something and you’re in the woods? Make due with what you have!

2015-05-30 07.27.22

Coffee: brewed weak

Spirits: Sinking slightly

A four hour drive to the Canadian border, where we stopped to check out Niagara Falls. The amount of water going over these cliffs is mindblowing – a picture doesn’t do it justice. The falls are loud, chaotic, and untamed. They throw mist  into the air hundreds of feet away (surprisingly refreshing, as it was pushing 85 today).

2015-05-30 13.17.22 2015-05-30 13.16.40

Another four hours back into the U.S.A. and to stay with my family in Lake Orion, MI (thank you, Jaacks!).

Miles traveled today: 528

Rainstorms encountered: 2?

Route followed:

Screenshot from 2015-05-30 22:57:37

Road Trip: Day 1

The trip is off to a good start!

Today, I packed up the Saab with camping gear, some leftover food from my apartment,  and my hammock. I left Cape Cod and headed to Scituate, MA to pick up Angela. She’s moving to Seattle, so the car is very full with all of our supplies…

2015-05-29 14.39.31

(Those are just the back seats, you don’t even want to see the trunk)

2015-05-29 14.39.50

Can’t forget the Brown hat!

After a quick last dip in the East Coast water

2015-05-29 14.50.06

We were on our way. Just over a 4 hour drive to a state forest in New York where we spent the night. I had a great place to toss up my hammock in the woods, but not a very restful sleep.

2015-05-30 06.53.09

Miles traveled: 288

Bears spotted: 0

Spirits: high

Route followed

Screenshot from 2015-05-30 22:55:04

Start of a road trip

I have a lot to update on this site… my last post is a year old!

Well, those updates will come. Most recently, I’m starting on a one month road trip of the United States. I’m going to use this blog as a way to share my thoughts, experiences and pictures.

Here is the proposed route I will be taking.

Screenshot from 2015-05-29 12:45:43

Section 1: Sandwich, MA to Seattle WA with Angela Ramponi

Section 2: Seattle, WA to Los Angeles, CA with my girlfriend Lizzy Kinnard

Section 3: Los Angeles, CA back to Sandwich, MA, going solo.

Most of the packing is done and I’m going to be driving off this afternoon. Wish me luck!

 

What’s in the portion of reads that don’t map to a reference?

One of the first steps in the analysis of most next generation sequencing datasets (unless you’re doing a novel genome or transcript assembly) is mapping to a reference genome. Mapping is a procedure that determines the location in the genome that each sequencing read came from. If you have good sequencing data, most of the reads will be mapped by the program you chose to use.

What about the small (usually <5%) portion of reads that fail to map, then? What can we learn from these reads? Can they be used for quality control or actual analyses?

As it turns out, a lot can be learned by analyzing unmapping reads. Let’s start by understanding how a read can fail to map to a reference genome.

  • Low quality or complexity: Some sequencing reads are filled with low quality base calls – either several ‘N’ bases in the reads or poor quality scores. These reads are usually eliminated by a filtering step before any downstream analysis. Low complexity reads – homopolymer and heteropolymer repeats, for example – are also impossible to align. Both examples don’t encode any useful information, but can be important in determining the quality of the sequencing library before further analysis. Trimming the low quality bases (if in a consistent position across the dataset) is one way to improve alignment.
  • Ambiguous alignment: Reads from repetitive parts of the genome may align to more than one position. In humans, this can be a large portion of the sequencing data, since over 50% of the human genome is repetitive DNA. Depending on the aligner and parameters you choose, reads with ambiguous alignments may be reported in one position or fail to map. Bowtie2, for example, reports a single alignment for ambiguous reads by default; it chooses between the best possible alignments with a random number generator.

    How can they be useful?

    Ambiguous reads can be used to find information on the repetitive part of the genome – what many scientists once called ‘junk DNA’. Repetitive sequences are actually important for

  • Discordant alignment (paired end sequencing): Paired end reads should be separated by a certain number of bases (plus or minus some standard deviation) when they map to a genome. This is because paired end protocols generate molecules of roughly the same length of which both ends are sequenced. Once again, the reporting of discordant alignments differs with the program and parameters.

    What can you do with them?

    Discordant alignments can give information about genome rearrangements, such as deletions, insertions and duplications. For example, If there’s strong evidence for two reads aligning at a distance greater than the insert size, it’s possible some DNA between the two loci was deleted. The inverse is also true: reads aligning at a distance less than the insert size can indicate novel insertions, such as retrotransposons. Peter Park’s lab at Harvard has been developing algorithms to detect these events in NGS data and has applied them to look at genome rearrangements in cancer.

  • The read came from another organism: A tissue sample isn’t always a pure culture of the cells you want to look at. Humans are host to a huge number of microbes, viruses and parasites that inevitably end up in a tissue sample. This is called the microbiome, which has been increasingly studied and found to be very important in health and disease. If other organisms are present in a tissue sample that’s being sequenced, some of their DNA will be sequenced as well. These reads won’t map to the reference genome.

    What can they tell us?

    Sequencing reads from the microbiome can tell you a lot about the communities of bacteria, fungi and viruses living in a sample. Several studies have compared the microbiome of individuals using next generation sequencing data.

That’s all the cases I can think of for why a read wouldn’t map to the reference, although it’s possible I missed some. In my next post I’ll talk about the analysis I’ve been doing on the unmaping portion of sequencing data and some interesting results!

Biotech and software companies at ISMB 2014

In my past three posts I talked about the highlights of the 2014 Intelligent Systems for Molecular Biology conference in Boston, MA. In addition to all the academic talks and events, there were a few industry presentations that stood out. I was pleased with the industry presence at the conference. As a student potentially looking for an industry job after I graduate, I enjoyed the chance to talk with some potential employers and see what kinds of positions are available for people with a bachelors degree in comp bio.

Good news: every company I talked with seemed willing to hire a programmer or data scientist with a bachelors degree. These positions typically weren’t advertised on their websites, so I get the feeling it takes some networking to actually get hired. It was definitely an encouraging experience, though!

A few of the industry partners gave presentations during the workshop sessions at ISMB. I attended two interesting talks:

Appistry and the “pipeline challenge”
Appistry (St. Louis, MO) develops high performance computing solutions and software for genomic analysis. Have you heard of the Genome Analysis ToolKit, the software developed by the Broad Institute for variant discovery and genotyping next generation sequencing data? Well, the Broad chose Appistry as the commercial partner for the GATK. 

The speaker first highlighted Ayrris, Appistry’s high performance computing platform. I didn’t get the technical details, but it sounds like Ayrris has built-in support for troublshooting genomics pipelines (something I spend so much of my time doing).

He then talked about the Pipeline Challenge. Appistry is sponsoring a contest for the best genomic analysis pipeline ideas. The winner will receive $70k in bioinformatics software and computer hardware. The Neretti lab has developed a few pipelines and ideas that would fit this contest well… I’m going to look into submitting one! Perhaps the new work we’ve been doing on the human “virome” and its role in cancer and disease?

Seven Bridges Genomics on bioinformatic reproducibility
Seven Bridges Genomics (Cambridge, MA) also develops software and pipelines for bioinformatic analysis. The focus of their presentation wasn’t on the actual pipelines, though, but rather methods and software they’re developing to increase reproducibility in genomics analysis. The speaker made a good point early on: publications often cite an image of a software pipeline in the methods section. When other researchers try to replicate the results, either with their own software or with the code published along with the paper, the analyses often don’t line up (and sometimes fail entirely). This is a huge problem in computational biology and bioinformatics – Titus Brown frequently blogs about reproducibility and most of the BOSC Special Interest Group focused on it as well.

Seven Bridges is proposing a solution based on their software platform Rabix.  They plan to use docker images to distribute the software as well as any dependencies used to do analysis in a publication. Docker is a lightweight way to distribute software and ensure it will run in any software environment – an alternative to bulky virtual machines that are sometimes published in an effort to distribute code. According to Seven Bridges, “With Rabix, data, tools and pipelines can be published in open repositories which will enable the community to both host and reuse them on their own infrastructure. This way, we can share the analysis itself, show instead of tell, and create reproducible building blocks to further research.”

 

Highlights from ISMB – Day 3

Today was the third and final day of the main ISMB conference! I slept in until noon (attending these things is surprisingly tiring) so I missed some of the morning sessions, but it was still a good day. Some highlights:

Workshop on alternative methods of peer review
The talks in this workshop focused on open access in publishing and scientific reproducibility. An increasingly popular topic is open peer review, where all aspects of the peer review process are published. This means the names of the reviewers, their comments and the author’s responses are all published with the online version of the article. In theory, this is a great idea. It increases openness, ensures readers are aware of problems with the article (both past and present) and lets authors know who is reviewing their article.

In practice, though, open peer review is difficult to implement. Some members of the audience brought up points of contention. For example, reviewers of a “big name” paper might be hesitant to criticize their superiors in the scientific community. Open peer review may also make it more difficult for editors to find reviewers for articles. The data say otherwise, though – since BMJ opened up the reveiw process, only 2% of editors have declined to review an article because of the change in policy. Other journals like F1000Research also operate on the open peer review model and seem to be doing just fine. I think the “openness” trend has just started to gain momentum and acceptance within the scientific community – it’ll be interesting to see how both authors and publishers respond in the future.

Final keynote by Russ Altman
The Altman lab at Stanford is doing some excellent work using informatics approaches to understand drug response. The ultimate goal is true pharmacogenetics and personalized medicine – imagine a doctor genotyping you in the office and picking a specific drug and dosage known to work best with your specific genes. His lab is doing a lot of machine learning and data mining on FDA drug interaction data and other publicly available sets. Along with some creative use of Amazon mechanical turk, they created a database of gene/drug relationships and ranked the side effects by severity.

The Altman lab is also working on predicting novel drug binding sites using protein structure. The method was complex and used a lot of interesting machine learning techniques (another reason I want all these talks to be online – going back to review and understand all the methods). In the end, they could predict small molecules most likely to interact with a protein’s active site and potentially inhibit it. These small molecules could be synthesized as part of a drug or found on another drug to re-purpose it.

The symposium then concluded with some closing remarks by the ISCB board members and awards for various posters and presentations. Overall, I was very pleased by the past few days and happy I attended. The personal and professional connections I made will help me in my search for a job and/or grad program. I saw some inspiring and interesting research, learned of cutting edge methods in the field and got to meet scientists I’ve only seen on paper before this week. A huge thanks to the ISCB members who organized this conference, as well as the Student Council for giving me the chance to present my research on Friday.

Highlights from ISMB – Day 2

I’ll be continuing my updates on the Intelligent Systems for Molecular Biology (ISMB) conference with some of the research that I saw today.

Tracking Cells in 4D by live cell imaging
Terumasa Tokunaga from the Institute of Statistical Mathematics in Japan presented a novel algorithm for tracking the positions of cells in 3D over time. He applied the algorithm to track neurons in C. elegans which had been florescently labeled and captured through fluorescence microscopy. Briefly, the algorithm identifies cells as the local maxima in density after smoothing with a kernel density function. A “repulsion” parameter aids in capturing local maxima rather than converging on the absolute maximum of the density. A maximum spanning tree is then built between cells and used to track the movement over time. This helps avoid merging trackers of individual cells or having trackers switch from one cell to another.

This method has the potential to be applied to study differentiation in more complicated organisms after the authors can allow for cell division. They did initial work on C. elegans because there are a fixed number of cells in adults, and they could assume no cell division.

ISCB and Student Council business meeting
I was interested in this meeting because I want to volunteer with the ISCB student council in planning the next conference (in Dublin, no less)! The student council also presented awards to the best oral presentation and two best poster presentations to three very talented students who I met on Friday.

Keynote by ISCB Overton Prize winner Dana Pe’er
This was, hands down, one of the most interesting scientific talks I’ve ever seen. Pe’er is answering questions about cell differentiation and heterogeneity though high-throughput single cell analysis methods. The concentrations of several biomarkers can be quantified at the single cell level through a technique known as mass cytrometry. Measuring these quantities in single cells puts statisticians back in the environment they are comfortable with – a small number of variables and many samples (as opposed to the big p, small situation so common in genomics). She also introduced some new methods for multivariate statistical analysis and dimensionality reduction (notable the Wonderlust and DREMI algorithms) that deserve a blog post of their own!

Reception at MIT
The ISCB was nice enough to give the volunteers a ticket to a reception at the MIT museum after the talks finished today. This was a great chance to socialize with the students and others in a cool environment, see the museum (it’s changed a lot since I was there 6 years ago!) and munch on some hors d’evours. I could definitely get used to all of these kind of events at conferences!

ISMB finishes tomorrow. Stay tuned for some more highlights and a post about the industry members I’ve been talking with.