scSpliceAtlas: Mapping Alternative Splicing at Single-Cell Resolution
The Splicing Blind Spot in Single-Cell Genomics
During my PhD at CUHK School of Life Sciences, I spent a lot of time working with single-cell RNA-seq data. One thing that always bothered me was how almost every analysis pipeline focused exclusively on gene expression — counting reads per gene, clustering cells, finding differentially expressed genes. But alternative splicing? Almost completely ignored at the single-cell level.
This is a significant gap. Alternative splicing is one of the key mechanisms that expands proteomic diversity from a relatively compact genome. Different cell types don’t just express different genes; they splice the same genes differently. And yet, the single-cell community had largely been treating transcripts as monolithic entities.
That frustration led me to build scSpliceAtlas — the first comprehensive database and toolkit for exploring cell-type-specific alternative splicing patterns across human tissues using single-cell RNA-seq.
What scSpliceAtlas Does
scSpliceAtlas is built around three integrated components:
1. A Snakemake Pipeline that processes Smart-seq2 data from the Human Cell Atlas end-to-end. The pipeline chains together STAR for alignment, Salmon for quantification, SUPPA2 for PSI (Percent Spliced In) calculation, and CellTypist for automated cell-type annotation. Everything gets compiled into a queryable SQLite database.
2. An R Package that provides programmatic access to the atlas. If you want to pull splicing data for a specific gene across cell types, or compare PSI values between conditions, you can do it from R without touching the raw data.
3. A Shiny Web Application for interactive exploration. Not everyone wants to write code to browse splicing patterns, so the web app provides a visual interface for querying and visualizing the atlas.
Five Flavors of Alternative Splicing
The atlas catalogs five major types of alternative splicing events:
- Skipped Exon (SE) — the most common type, where an exon is either included or excluded from the mature mRNA
- Alternative 5’ Splice Site (A5SS) — competing donor sites change the boundary at the 5’ end of an intron
- Alternative 3’ Splice Site (A3SS) — competing acceptor sites change the boundary at the 3’ end
- Mutually Exclusive Exons (MXE) — exactly one of two consecutive exons is included, never both
- Retained Intron (RI) — an intron remains in the mature transcript instead of being spliced out
For each event type, we calculate PSI values at the single-cell level, which gives you a continuous measure of how frequently a particular splicing isoform is used in each individual cell.
Scale and Scope
The target scale for scSpliceAtlas is ambitious: over 100,000 cells spanning more than 20 cell types across at least 5 human tissues. Smart-seq2 was the platform of choice because, unlike droplet-based methods like 10x Genomics, it provides full-length transcript coverage — which is essential for accurate splicing quantification. You simply cannot reliably detect alternative splicing from the 3’-biased reads that most droplet platforms produce.
By drawing data from the Human Cell Atlas, we ensure that the atlas is built on a standardized, community-vetted dataset rather than a patchwork of studies with different protocols and quality standards.
Why This Matters
If you are studying cell differentiation, you might discover that a gene’s expression does not change, but its splicing pattern shifts dramatically as cells transition from one state to another. If you are investigating disease, you might find that a pathological cell type uses a rare splice variant that produces a dysfunctional protein. These are patterns you would completely miss with conventional gene-level analysis.
scSpliceAtlas makes these patterns accessible. Whether you are a computational biologist who wants to integrate splicing into your analysis pipeline, or a bench scientist who wants to explore splicing in your cell type of interest, the goal is to provide a ready-to-use resource.
What I Learned Building This
This project was a deep dive into full-stack bioinformatics in the truest sense:
- Database and resource design — structuring biological data so it is both computationally efficient and intuitive for users to query
- Large-scale data integration — pulling and harmonizing data from the Human Cell Atlas at scale
- Alternative splicing analysis — understanding the biology and the computational methods (PSI estimation, splicing event detection)
- Full-stack bioinformatics — designing a Snakemake pipeline, developing an R package, and building a Shiny web application as an integrated system
- R package development — proper package structure, documentation, and API design for bioinformatics tools
Building scSpliceAtlas reinforced my belief that the most impactful bioinformatics tools are not just algorithms — they are resources that make complex data accessible to the broader research community.
You can explore the code and documentation at github.com/loganylchen/scSpliceAtlas.