Tuesday, July 23, 2019

Week 6 - McKellar

The main project I have been working on in New York City is the analysis of a large set of single-cell RNA sequencing data in collaboration with Laura Donlin (Hospital for Special Surgery Research Institute). The data sets were collected using a low-cost Drop-seq set up that was developed by the Satija Lab, down at the New York Genome Center at NYU. They are more well-known for their software development (Seurat, https://satijalab.org/seurat/), but this Drop-seq set up boasts a tiny price tag, compared to commercial systems. The 3D-printed device (pictured below) as well as a small subset of the data sets that I have been working with were published in Nature Communications last year (https://www.nature.com/articles/s41467-017-02659-x).


(Stephenson et al, Nat Comm, 2018)

The data sets I am looking at are dissociated synovial membrane tissue (some with matched PBMCs) that came from 4 different types of patients- healthy, osteoarthritis, psoriatic arthritis, and rheumatoid arthritis. They were collected from four different joints- knee, elbow, shoulder, and hip. In total, after some quality filtering, the total data set contains roughly 130,000 cells. Below is a plot that shows each cell as a dot. The cells are plotted along two dimensions that are defined by dimensionally reducing the 130,000 by 29,000 gene expression matrix (130,000 cells by 29,000 genes) using principal component analysis and uniform manifold approximation and projection, or UMAP (https://www.nature.com/articles/nbt.4314). I am going to hold off on labeling any cell populations, or even to say how many cell populations there actually are, because I am not yet done with the analysis. I can say that there is a lot of interesting biology to be learned from this data and a lot of fun work left to be done!
  


This is a relatively large data set, that is spread across 4 disease states, 5 tissue types, and perhaps most importantly 38 different sample collections/preparations. There is a lot of room for technical differences between these samples, and that can easily skew analyses. There have been many methods proposed to integrate data sets and remove technical biases, but I am not sure if anyone has been able to conclusively show that their methods maintain biological differences after that integration (at least by my own standards). One important point of my analysis will be to benchmark a few of these methods (Seurat’s CCA, SCTransform, Scanorama, and Harmony to start with) and come up with a more quantitative way to measure the changes that these methods impart on our data.

We have a lot more fun ideas to combine computational methods with clinical data. This work won’t end next week when I return to Ithaca, but it will continue on for the foreseeable future. The Donlin Lab, De Vlaminck Lab, and Cosgrove Lab will begin working on a new project to study myositis at single-cell resolution and I am looking forward to coming back to NYC as soon as I can!

No comments:

Post a Comment

Week 7- Chase Webb

Since this post is coming after the conclusion of the immersion experience, I wanted to take the time to reflect on it as a whole. Overall, ...