Single-cell RNA sequencing is everywhere. It’s the buzzword-y
assay that seems to have hit every model organism, every tissue, and every
institute across biology. Drop-seq (Macosko et al, Cell, 2015) and other
microfluidics-based platforms (10x Chromium) are by far the most commonly used
(although admittedly I don’t have any numbers on this… just trust me). But how do
these methods work at the molecular scale? How can we routinely capture
millions of RNA molecules and identify the cell types from which they came?
Well, here is how I think about it. Specifically, here is how I think about
Drop-seq, since I will continue to view the 10x Chromium platform as a small
box filled with black magic.
Shrink yourself down to 10 microns. You’re a tiny, semi-spherical
cell floating around in a solution of phosphate-buffered saline, probably with
some bovine serum alumina or fetal bovine serum added in to happily coat your
surface receptors (you have those now, since you’re a cell). Now, feel yourself
getting sucked up, out of your 15ml Falcon tube, and into a tiny piece of
tubing. As you fly through this round tube for what feels like forever, you hit
a sudden jerk- a transition into a new, more square chamber. It’s not long
before you are engulfed by a droplet, hopefully, but not all too likely, by yourself.
You quickly disintegrate (this story is making me regret my personification of
single-cells) into your constituent parts. Fortunately, a fuzzy-looking ball that
has joined picks up some of your pieces. This ball seems to have some sort of
affinity for your RNAs, but how many of them? As your lytic ride comes to an
end, your bubble bursts, and this fuzzy ball settles to the bottom of a new
tube….
Now that I’ve mocked the field in which I study, let me take
a chance to explain why I think our current methods (although they were genius breakthroughs
that have inspired a new generation of biology) are not as good as we think.
Over the past year, I have come to see single-cell RNA
sequencing as a series of probabilistic sampling steps. As the cell lyses,
there is no way that every RNA gets captured. I don’t think anyone believes
that, but I do think it is important to remember while analyzing these data
that you are only looking at a fractional subset of the ssRNA within a cell. Even
within that fraction, most assume that every captured molecule is a coding
mRNA, captured at its three-prime end. Why? [“Because we learned about
Watson-Crick base-pairing in high school, that’s why.”] Well Watson-Crick
base-pairing is only the most probable case. Along these massive molecules,
there likely exist some stretch of nucleotides where more than 25% of them
(random chance) are A’s. That stretch, as the percentage or density of A’s goes
up, becomes more likely to bind the poly-dT probe. Now consider the drastically
different lengths of poly-A tails on mRNAs. Many are only half the length of
the poly-dT probes! (Maier et al, bioRxiv, 2019; work from Bjorn Schwalb’s
lab which has been updated since I last read it) Why aren’t we trying to improve these capturing methods? If every molecule
is different, why are we using the same, inefficient 25-30 base poly-dT probe?
Reverse transcription and PCR amplification are more of the
same. Each step, no matter how important, just loses more and more of your
initial library of RNA molecules. Given the shear magnitude of events, there is
just no way that every one of them gets reverse-transcribed and PCR-amplified.
That would be incredibly unlikely. And don’t even get me started on sequencing.
You really think that everyone of your millions of DNA fragments bound the flow
cell, formed its little island, and was properly sequenced? I think not.
So why should we care? Even with just a small fraction of
the RNAs you can easily identify cell types and find changes on gene expression
in disease states or along developmental time. Well my frustration is that
single-cell RNA sequencing is only a machine to generate hypotheses. No drug will
come to market with just a couple 10x data sets backing its efficacy. This data
is just not all that trustworthy. The biggest issue there is that labs are
paying a TON of money to generate these data sets and then they just
immediately have a huge plate of new experiments they need to perform to
confirm their findings. Imagine the scientific power that would underlie
complete single-cell transcriptome sequencing- every molecule captured and identified
at 100% confidence. This is one of the many pipedreams that underlie my
scientific pursuits. Maybe one day, but for now I will continue to be a lowly
grad student fishing for non-coding RNAs in oceans of GAPDH transcripts.
No comments:
Post a Comment