Statistical Methods For The Deconvolution Of Bulk Tissue Rna Sequencing Data

Doctoral Candidate Name: 
Su Xu
Program: 
Mathematics (Applied)
Abstract: 

Bulk RNA sequencing (RNA-seq) provides a cost-effective overview of gene expression but lacks resolution to identify cell-type-specific contributions in heterogeneous tissues. Computational deconvolution methods address this by estimating cell-type proportions from bulk data, enabling finer biological insights. This dissertation develops and applies statistical frameworks to improve the accuracy and interpretability of deconvolution results.

We begin by reviewing RNA-seq technologies and the impact of cellular heterogeneity. Deconvolution is then framed as a nonnegative matrix factorization (NMF) problem, with attention to challenges like non-uniqueness and noise sensitivity. Building on recent identifiability theory, we propose a geometric structure-guided NMF (GSNMF) that incorporates biological priors—such as marker genes—and local manifold structure to stabilize estimation.

To further enhance reference-free deconvolution, we introduce pseudo-bulk augmentation: a strategy that synthesizes single-cell-derived mixtures to enrich bulk data. This approach mitigates issues related to underdetermined solutions and improves robustness.

A comprehensive benchmarking study compares reference-based and reference-free methods using metrics like correlation, root mean squared error, and mean absolute deviation. Results show that while high-quality reference data can improve performance, augmented reference-free approaches like GSNMF are highly effective when reference data are scarce. We conclude with future directions and ongoing challenges.

Defense Date and Time: 
Tuesday, April 8, 2025 - 9:00am
Defense Location: 
Fretwell 315
Committee Chair's Name: 
Dr. Shaoyu Li, Dr. Duan Chen
Committee Members: 
Dr. Xue Wang, Dr. Daniel Janies