MoonFinder3: Hunting for 'Moonlighting' Long Non-Coding RNAs
If you have ever worked with long non-coding RNAs (lncRNAs), you know the feeling: thousands of them are differentially expressed, but figuring out what they actually do is an entirely different beast. During my PhD at CUHK School of Life Sciences, I kept running into the same question — what if a single lncRNA participates in multiple, seemingly unrelated biological processes at the same time?
That question led me to develop MoonFinder3 and its algorithmic companion MPAGE. Here is the story of how they came to be, what they do, and what I learned building them.
What Are “Moonlighting” lncRNAs?
The term “moonlighting” is borrowed from protein biology, where a moonlighting protein performs more than one independent function. I applied the same idea to lncRNAs: a moonlighting lncRNA is one that participates in multiple distinct biological processes simultaneously. These molecules are fascinating because they may serve as hubs that coordinate cross-talk between pathways — but they are notoriously hard to detect with standard differential expression analysis alone.
The mPAGE Algorithm
The core idea behind MPAGE (module Pair-wise Analysis of Gene Expression) is deceptively simple: if we can measure how “diverse” a lncRNA’s functional associations are, we can rank them by their moonlighting potential.
Here is how it works:
- Build a co-expression network. Using WGCNA (Weighted Gene Co-expression Network Analysis), we construct a gene co-expression network from expression matrices and identify gene modules — clusters of co-expressed genes.
- Score functional diversity. For each lncRNA, we compute a score based on Simpson’s Diversity Index across the modules it is associated with. A lncRNA that connects to many functionally distinct modules gets a high score.
- Assess module similarity. We calculate semantic similarity between GO terms enriched in different modules using five methods — Resnik, Lin, Rel, Jiang, and Wang — via the GOSemSim package. This lets us quantify whether two modules are truly functionally distinct or just variations on the same theme.
The mircluster Algorithm
Standard WGCNA module detection sometimes produces modules that are too coarse for our purposes. I developed a custom seed-growing algorithm called mircluster that works as follows:
- Start from high-degree nodes (hub genes) as seeds.
- Iteratively add boundary nodes that increase a synergy score — a metric that balances within-module cohesion against between-module separation.
- Remove nodes that no longer contribute to the synergy score.
- Repeat until convergence.
The result is a set of tighter, more functionally coherent modules that make the downstream diversity scoring more meaningful.
Survival Analysis and the Shiny Browser
Identifying moonlighting lncRNAs is only half the battle — you also need to know whether they matter clinically. MoonFinder3 includes built-in survival analysis using Kaplan-Meier curves and log-rank tests, so you can test whether patients stratified by the expression of a candidate moonlighting lncRNA show significantly different outcomes.
To make exploration easier, I also built an interactive Shiny browser. You can load your results, click through lncRNAs, visualize their module associations, and inspect survival plots — all without writing a single line of code.
A Fun Piece of History
Here is something that still makes me smile: MoonFinder v1.x was originally an astronomical moon phase calculator. I had written it as a small side project to predict lunar phases. When I started working on the lncRNA project, the name felt too perfect to abandon — after all, we are “finding” moonlighting RNAs. So v2.0 was a complete rewrite that turned an astronomy toy into a bioinformatics package. By v3, it had matured into the tool described here.
MPAGE vs. MoonFinder3
To keep things modular, I split the work into two packages:
- MPAGE is the algorithmic core. It implements the module detection, diversity scoring, and semantic similarity calculations as reusable R functions. You can use it with any gene expression data, not just lncRNAs.
- MoonFinder3 is the application layer. It wraps MPAGE specifically for lncRNA biology, adding survival analysis, the Shiny browser, and lncRNA-specific utilities.
What I Learned
Building these packages taught me a tremendous amount:
- R package engineering: Proper namespace management, S3/S4 methods, CRAN/Bioconductor submission standards, and writing vignettes that people can actually follow.
- WGCNA and graph algorithms: Working with adjacency matrices, TOM (Topological Overlap Matrix), and module detection gave me a deep appreciation for network biology.
- Semantic similarity: Understanding how Gene Ontology DAGs work and how different similarity measures (Resnik vs. Wang, for instance) capture different aspects of functional relatedness.
- Survival analysis: Kaplan-Meier estimators, Cox proportional hazards, log-rank tests — the statistical toolbox for clinical genomics.
- Bioconductor development: Navigating the Bioconductor ecosystem, following their coding standards, and understanding how packages interoperate.
If you are interested in exploring moonlighting lncRNAs in your own data, check out MoonFinder3 and MPAGE on GitHub. Contributions and feedback are always welcome.