The main project I have been working on in New York City is the analysis of a large set of single-cell RNA sequencing data in collaboration with Laura Donlin (Hospital for Special Surgery Research Institute). The data sets were collected using a low-cost Drop-seq set up that was developed by the Satija Lab, down at the New York Genome Center at NYU. They are more well-known for their software development (Seurat, https://satijalab.org/seurat/), but this Drop-seq set up boasts a tiny price tag, compared to commercial systems. The 3D-printed device (pictured below) as well as a small subset of the data sets that I have been working with were published in Nature Communications last year (https://www.nature.com/articles/s41467-017-02659-x).
(Stephenson et al, Nat Comm, 2018)
The data sets I am looking at are dissociated synovial
membrane tissue (some with matched PBMCs) that came from 4 different types of
patients- healthy, osteoarthritis, psoriatic arthritis, and rheumatoid
arthritis. They were collected from four different joints- knee, elbow,
shoulder, and hip. In total, after some quality filtering, the total data set
contains roughly 130,000 cells. Below is a plot that shows each cell as a dot.
The cells are plotted along two dimensions that are defined by dimensionally
reducing the 130,000 by 29,000 gene expression matrix (130,000 cells by 29,000
genes) using principal component analysis and uniform manifold approximation
and projection, or UMAP (https://www.nature.com/articles/nbt.4314).
I am going to hold off on labeling any cell populations, or even to say how
many cell populations there actually are, because I am not yet done with the
analysis. I can say that there is a lot of interesting biology to be learned from
this data and a lot of fun work left to be done!
This is a relatively large data set, that is spread across 4
disease states, 5 tissue types, and perhaps most importantly 38 different sample
collections/preparations. There is a lot of room for technical differences
between these samples, and that can easily skew analyses. There have been many
methods proposed to integrate data sets and remove technical biases, but I am
not sure if anyone has been able to conclusively show that their methods
maintain biological differences after that integration (at least by my own
standards). One important point of my analysis will be to benchmark a few of
these methods (Seurat’s CCA, SCTransform, Scanorama, and Harmony to start with)
and come up with a more quantitative way to measure the changes that these
methods impart on our data.
We have a lot more fun ideas to combine computational
methods with clinical data. This work won’t end next week when I return to
Ithaca, but it will continue on for the foreseeable future. The Donlin Lab, De
Vlaminck Lab, and Cosgrove Lab will begin working on a new project to study
myositis at single-cell resolution and I am looking forward to coming back to
NYC as soon as I can!
No comments:
Post a Comment