BME Summer Immersion 2019: McKellar

Single-cell RNA sequencing is everywhere. It’s the buzzword-y assay that seems to have hit every model organism, every tissue, and every institute across biology. Drop-seq (Macosko et al, Cell, 2015) and other microfluidics-based platforms (10x Chromium) are by far the most commonly used (although admittedly I don’t have any numbers on this… just trust me). But how do these methods work at the molecular scale? How can we routinely capture millions of RNA molecules and identify the cell types from which they came? Well, here is how I think about it. Specifically, here is how I think about Drop-seq, since I will continue to view the 10x Chromium platform as a small box filled with black magic.

Shrink yourself down to 10 microns. You’re a tiny, semi-spherical cell floating around in a solution of phosphate-buffered saline, probably with some bovine serum alumina or fetal bovine serum added in to happily coat your surface receptors (you have those now, since you’re a cell). Now, feel yourself getting sucked up, out of your 15ml Falcon tube, and into a tiny piece of tubing. As you fly through this round tube for what feels like forever, you hit a sudden jerk- a transition into a new, more square chamber. It’s not long before you are engulfed by a droplet, hopefully, but not all too likely, by yourself. You quickly disintegrate (this story is making me regret my personification of single-cells) into your constituent parts. Fortunately, a fuzzy-looking ball that has joined picks up some of your pieces. This ball seems to have some sort of affinity for your RNAs, but how many of them? As your lytic ride comes to an end, your bubble bursts, and this fuzzy ball settles to the bottom of a new tube….

Now that I’ve mocked the field in which I study, let me take a chance to explain why I think our current methods (although they were genius breakthroughs that have inspired a new generation of biology) are not as good as we think.

Over the past year, I have come to see single-cell RNA sequencing as a series of probabilistic sampling steps. As the cell lyses, there is no way that every RNA gets captured. I don’t think anyone believes that, but I do think it is important to remember while analyzing these data that you are only looking at a fractional subset of the ssRNA within a cell. Even within that fraction, most assume that every captured molecule is a coding mRNA, captured at its three-prime end. Why? [“Because we learned about Watson-Crick base-pairing in high school, that’s why.”] Well Watson-Crick base-pairing is only the most probable case. Along these massive molecules, there likely exist some stretch of nucleotides where more than 25% of them (random chance) are A’s. That stretch, as the percentage or density of A’s goes up, becomes more likely to bind the poly-dT probe. Now consider the drastically different lengths of poly-A tails on mRNAs. Many are only half the length of the poly-dT probes! (Maier et al, bioRxiv, 2019; work from Bjorn Schwalb’s lab which has been updated since I last read it) Why aren’t we trying to improve these capturing methods? If every molecule is different, why are we using the same, inefficient 25-30 base poly-dT probe?

Reverse transcription and PCR amplification are more of the same. Each step, no matter how important, just loses more and more of your initial library of RNA molecules. Given the shear magnitude of events, there is just no way that every one of them gets reverse-transcribed and PCR-amplified. That would be incredibly unlikely. And don’t even get me started on sequencing. You really think that everyone of your millions of DNA fragments bound the flow cell, formed its little island, and was properly sequenced? I think not.

So why should we care? Even with just a small fraction of the RNAs you can easily identify cell types and find changes on gene expression in disease states or along developmental time. Well my frustration is that single-cell RNA sequencing is only a machine to generate hypotheses. No drug will come to market with just a couple 10x data sets backing its efficacy. This data is just not all that trustworthy. The biggest issue there is that labs are paying a TON of money to generate these data sets and then they just immediately have a huge plate of new experiments they need to perform to confirm their findings. Imagine the scientific power that would underlie complete single-cell transcriptome sequencing- every molecule captured and identified at 100% confidence. This is one of the many pipedreams that underlie my scientific pursuits. Maybe one day, but for now I will continue to be a lowly grad student fishing for non-coding RNAs in oceans of GAPDH transcripts.

BME Summer Immersion 2019

Friday, June 28, 2019

McKellar - Week 3

No comments:

Post a Comment

Week 7- Chase Webb

Report Abuse

Labels