Landscape and Architecture of cis-regulatory Modules and Prediction of Their Functional Types, States and Target Genes

Doctoral Candidate Name: 
Sisi Yuan
Program: 
Bioinformatics and Computational Biology
Abstract: 

Cis-regulatory modules (CRMs) can function as enhancers and/or silencers to promote and repress, respectively, the transcription of their target genes in a spatiotemporal manner, thereby playing critical roles in virtually all biological processes. However, despite recent progresses, the understanding of CRMs’ precise locations, landscape and architecture in terms of transcription factor (TF) binding sites (TFBSs) in the genomes as well as their functional types (enhancer or silencer), states (active or inactive) and target genes in various cell/tissue types of organisms is still limited.
We have recently predicted comprehensive maps of CRMs and constituent TFBSs in the human and mouse genomes, enabling us to investigate the organization and architecture of the CRMs in both genomes. We reveal common rules of the organization and architecture of CRMs in the genomes. We conclude that the rules governing the organization and architecture of CRMs in the human and mouse genomes are highly conserved.
Moreover, until recently research has long been focused on enhancers, and much less is known about silencers. To fill the gap, we develop two logistic regression models for predicting the functional states of our previously predicted 1.2M CRMs as enhancers and silencers in any cell/tissue types using five epigenetic marks data. Applying the models to 56 human cell/tissue types with the required data available, we predict that 793,140 of the 1.2M CRMs are active as enhancers or/and silencers in at least one of these cell/tissue types, of which 14.8% and 28.6% of them only function as enhancers (enhancer-predominant) and silencers (silencer-predominant), respectively, while 10.6% functioned both as enhancers and silencers (dual functional). Thus, both dual functional CRMs and silencers might be more prevalent than previously assumed. Most dual functional CRMs function either as enhancers or silencers in different cell/tissue types (Type I), while some have dual functions regulating different genes in the same cell/tissue types (Type II). Different types of CRMs display different lengths and TFBS densities, reflecting the complexity of their functions.
Furthermore, identifying their target genes of predicted or experimentally validated CRMs remains a challenge due to the low quality of the predicted CRMs and the fact that CRMs often do not regulate their closest genes. To fill this gap, we developed a method — correlation and physical proximity (CAPP) to not only predict the CRMs’ target genes but also their functional types using only chromatin accessibility (CA) and RNA-seq data in a panel of cell/tissue types plus Hi-C data in a few cell types. Applying CAPP to a panel of 107 human cell/tissue types with CA and RNA-seq data available, we predict target genes for 20% of the 1.2M CRMs, of which 4.5% are predicted as both enhancers and silencers (dual functional CRMs), 95.2% as exclusive enhancers and 0.3% as exclusive silencers. Different types of CRMs as well as their target genes and regulatory links exhibit distinct properties. CAPP predicts more enhancer-gene and silencer-gene links with higher accuracy than state-of-the-art methods.

Defense Date and Time: 
Tuesday, June 25, 2024 - 10:00am
Defense Location: 
Bioinformatics 408
Committee Chair's Name: 
Dr. Zhengchang Su
Committee Members: 
Dr. Abbe LaBella, Dr. Jun-tao Guo, Dr. Bao-Hua Song