CMap @ Broad Institute

From 2015 to 2017, I worked as a computational biologist in the Connectivity Map (CMap) group at the Broad Institute. The goal of CMap is twofold: cmap

  1. Create the world’s largest perturbation driven gene expression dataset by treating a group of cell lines with small molecule compounds and gene knock-down, knock-out, and overexpression. Process and share these transcriptional signatures with the community so scientists can use them to make discoveries in their specific fields.
  2. Leverage the perturbational dataset to discover new insights into biology. CMap can be used to discover new drugs (and re-purpose old ones), understand the transcriptional effect of gene mutations and gain insight into gene regulatory networks.

My work on CMap was centered around interpreting the perturbational signatures in the context of patient-derived gene expression data. The Cancer Genome Atlas (TCGA) and other projects have made a wealth of DNA and RNA sequencing data publicly available. Transcriptional data from patients with different cancers and genetic alterations begs to be compared with CMap, but there wasn’t a clear method to do so. Comparing the two datasets can answer questions like:

  • Do tumor cells with mutations in a given gene have similar transcriptional signatures to our experiments of knock-down or inhibition?
  • Can comparison with CMap be used to annotate sets of patients with similar transcriptional programs, that might otherwise not be correlated?
  • Can the transcriptional signature from a patient be used to predict which pathways are active and driving tumorigenesis?
  • Tumors that become resistant to drugs should activate different transcriptional pathways. Can we pick this up with CMap?

I’ve presented this research at a few conferences, you can see my poster here.

If you are interested in the Connectivity Map project, you can learn more and explore the dataset at http://clue.io