BUILDING COMPUTATIONAL REPRESENTATIONS OF MEDICAL TEXTS USING LARGE LANGUAGE MODELS

Doctoral Candidate Name: 
Seethalakshmi Gopalakrishnan
Program: 
Computing and Information Systems
Abstract: 

This dissertation explores the potential of natural language models, including large language models, to extract causal relations from medical texts, specifically from Clinical Practice Guidelines. The outcomes of causality extraction from Clinical Practice Guidelines on gestational diabetes are presented, marking a first in the field. We also release, the first of its kind, an annotated corpus of causal statements in the Clinical Practice Guidelines.
We address the challenge of classifying causal sentences with a small amount of annotated data at the inter-sentence level by treating it as a cross-domain transfer learning problem. Obtaining these classified sentences is the first step in extracting causality. Furthermore, we delve into the importance of modal verbs and the degree of influence from cause to effect. We show the capability of three models (BERT, DistilBERT, and BioBERT) to identify the degree of influence in the text.
Lastly, we tackle the challenge of sparse annotated data for the causality extraction from Clinical Practice Guidelines by, again, using transfer learning. We investigate the correlation between data similarity and the efficacy of transfer learning. We also investigate a zero-shot and few-shot approach to cross-domain transfer learning and quantify the link between data similarity and success rates. With the cross-domain few-shot transfer learning, we achieve an F1-score of 81%, which suggests transfer learning as a possible solution to address the limited availability of annotated data.

Defense Date and Time: 
Tuesday, November 7, 2023 - 10:00am
Defense Location: 
https://charlotte-edu.zoom.us/j/98080642229?pwd=QXRhZXBBcTF2YmFrVmpkSlBSMkkvQT09
Committee Chair's Name: 
Dr. Wlodek Zadrozny