07/21/2025
By Madhavi Suyog Pagare
The Kennedy College of Sciences, Miner School of Computer & Information Sciences, invites you to attend a doctoral thesis defense by Madhavi Pagare on "Causal Learning-Enabled Hierarchical Classification of Sociocultural Factors from Clinical Notes for Understanding Opioid Use Disorder and Breast Cancer Recurrence."
Candidate: Madhavi Pagare
Location: DAN 309 and via Zoom
Date: July 31, 2025
Time 10 to 11 a.m.
Committee:
- Chair Mohammad Arif Ul Alam, Assistant Professor, Miner School of Computer and Information Sciences, Biomedical Engineering and Biotechnology
- Benyuan Liu, Professor, Director, Miner School of Computer & Information Sciences, UMass Center for Digital Health (CDH), Computer Networking Lab, CHORDS
- Yu Cao, Professor, Director, Miner School of Computer & Information Sciences, UMass Center for Digital Health (CDH)
- Dan Berlowitz, Professor, Co-Director - CHORDS, Zuckerberg College of Health Sciences, Public Health, UMass Lowell
- Inyene Essien-Aleksi, Assistant Professor, School of Nursing and Health Sciences, Merrimack College
Abstract:
Sociocultural factors like living conditions, income levels, trauma, perceived discrimination and social support networks are critical in shaping individuals’ experiences and outcomes, influencing both clinical and social problems. These factors are particularly significant in opioid abuse recovery. Traditionally identified through community and personal surveys, these factors often suffer from self-report biases and limited question scope. Written text narratives, such as clinical notes, offer a richer source of information for extracting these determinants. Advanced computational models, including NLP and Transformer architectures, can effectively analyze these narratives to uncover sociocultural factors that traditional surveys might miss. Combining NLP-based detection with causal inference studies can evaluate the impact of interventions targeting these factors, providing a deeper understanding and accurate measurement of their effects on outcomes like opioid abuse recovery.
This Doctoral Thesis leverages NLP, Transformers, and LLM-augmented labeling for hierarchical classification of sociocultural factors in text narratives, followed by using a Siamese neural network-based Subgroup Discovery technique for causal inference to assess the impact of these factors on outcomes. The initial phase of this research aims to leverage Large Language Models (LLMs) to develop an automated prediction system for Sociocultural Factors of Mental Health (SFOMH) in opioid abuse patients, using their discharge summaries. We introduce a novel Human-in-the-Loop-LLM Interaction for Annotation (HLLIA) methodology to enable efficient and precise labeling of texts related to SFOMHs. Building on this foundation, we designed a Multilevel Hierarchical Clinical-Longformer (MHCL) classification algorithm to predict these determinants in clinical notes.
In the subsequent phase, we investigate the causal relationships between SFOMHs and Opioid Use Disorder (OUD) outcomes. Although previous studies have established correlations between social determinants and OUD, the causal links remain largely unexplored due to the lack of robust causal models. To address this gap, we propose a two-step causal effects identification framework. First, the MHCL model is employed to detect social determinants within unstructured clinical notes. Following this, we developed a Siamese Neural Network-based subgroup discovery technique to ascertain the causal effects of these determinants on OUD progression. This innovative approach leverages the Siamese architecture’s capability to handle complex relationships, enhancing the precision of causal inference.
Finally, we adapted this hierarchical classification and causal inference pipeline to the challenge of Breast Cancer Recurrence (BCR). Using our newly curated SFOMH-OncoBreast-Clinic corpus of de-identified oncology discharge summaries, we first applied a Clinical-Longformer Multi-Task Multi-Label Classifier (CLMT-MLC) to extract 22 mental-health–related social determinants from free-text clinical notes. These structured determinants were then analyzed using a Siamese Neural Network–based Subgroup Discovery (SNN-SD) approach, alongside IPTW-RA estimators and a Causal Effect Variational AutoEncoder (CEVAE), to generate determinant-specific Conditional Average Treatment Effects (CATEs). This multi-resolver framework uncovered actionable drivers of BCR and provided individualized causal evidence to inform personalized survivorship care.
By validating our model across clinical domains, this comprehensive evaluation demonstrated the generalizability and robustness of our approach, deepening our understanding of the broader impact of sociocultural factors and informing targeted interventions across diverse populations.