Prioritization of Genetic Mutations in Cancer Cell Lines

From March to May 2022, I engaged in research in Professor Bonnie Berger’s lab at MIT where I developed computational methods to study the genetic basis of cancer. I analyzed a large collection of genetic mutations discovered in cancer cell lines at the Broad Institute and prioritized top mutations for further investigation.

Each mutation I studied was a single-nucleotide polymorphism (SNP), which is a substitution of a single nucleotide (A,C,T,G) at a specific position in the DNA sequence. An example of a SNP is shown below where a nucleotide at position 102016044 on chromosome 10 had been altered from C to T.

chr10 102016044 C T

Given 4,803 mutations like this, my goal was to reduce this large collection into a smaller set that could be investigated further. I achieved this by developing software to first lift-over the mutations from the hg38 to hg19 genome reference build for standardization. To rank their importance, I then determined which mutations occurred in clinical patient samples from the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas Program (TCGA). I ultimately discovered 17 mutations that our experimental collaborators at the Broad Institute could analyze further. This project was implemented and rigorously tested using Python and Linux shell scripts.