03/24/2026
By Dipika Boro

The Miner School of Computer and Information Sciences, Department of Computer Science, invites you to attend a doctoral dissertation defense by Dipika Boro on “Advancing Endoscopic Polyp Detection, Segmentation and Characterization using Pretrained Deep Architectures and Self-Supervised Representation Learning.”

Candidate Name: Dipika Boro
Degree: Doctoral
Defense Date: Monday, April, 6, 2026
Time : 10 - 11 a.m. EST
Location: This will be a virtual defense via Zoom
Dissertation Title: “Advancing Endoscopic Polyp Detection, Segmentation and Characterization using Pretrained Deep Architectures and Self-Supervised Representation Learning

Committee Members:

  • Yu Cao (Advisor), Professor, Miner School of Computer & Information Sciences; Director, UMass Center for Digital Health (CDH)
  • Benyuan Liu (Advisor), Professor, Director, Miner School of Computer & Information Sciences, UMass Center for Digital Health (CDH), Computer Networking Lab, CHORDS
  • Hengyong Yu (Member), Professor, FIEEE, FAAPM, FAIMBE, FAAIA, FAIIA, Department of Electrical and Computer Engineering
  • Qilei Chen (Member), Research Scientist, Miner School of Computer and Information Sciences

Abstract:

Colorectal cancer is the third most common cancer worldwide and the second leading cause of cancer-related deaths, with a concerning rise in incidence among younger adults despite overall declines. Routine endoscopic screening and early detection are critical for improving patient outcomes. The resulting video data offers significant potential for training deep learning models. However, large-scale annotation remains challenging due to the need for clinical expertise, privacy constraints, and increasing clinician workload. While transfer learning has been widely used to mitigate limited labeled data, self-supervised learning (SSL) has emerged as a promising alternative, though challenges in generalization and interpretability continue to limit clinical adoption. This dissertation addresses these challenges by developing approaches that bridge data scarcity and the gap between research and real-world deployment in endoscopic imaging.

First, we present an exploratory study of cross-domain transfer learning for polyp segmentation, evaluating the effectiveness of different pretraining strategies under limited labeled data conditions. We compare models pretrained on natural images with those pretrained on diverse medical imaging modalities, including histopathology, CT, and MRI. Using both convolutional and transformer-based backbones with a DeepLabV3+ decoder, we conduct extensive experiments to systematically assess their effectiveness across multiple public benchmarks. Our results demonstrate that modality alignment plays a critical role, with models pretrained on natural images consistently outperforming those pretrained on radiological and other medical modalities.

To further address data scarcity, we introduce EndoMAE, a self-supervised foundation model for endoscopic image analysis. We curate a large-scale dataset comprising over 10 million unlabeled endoscopic frames from clinical videos and public sources, making it the largest dataset used to date for pretraining in this domain. EndoMAE follows the Masked Autoencoder framework, learning domain-specific representations through masked image reconstruction. Evaluations on downstream classification and segmentation tasks demonstrate strong generalization and improved performance over both supervised and self-supervised baselines, underscoring the effectiveness of SSL for building robust endoscopic models.

Finally, we propose a multi-modal, geometry-aware framework for polyp size estimation that integrates visual, depth, and geometric information to overcome the limitations of purely 2D analysis. The pipeline integrates lesion detection, segmentation, and monocular depth estimation, followed by a novel MGPS-Net architecture that fuses RGB, mask, depth, and geometry-derived features through an attention mechanism for accurate size prediction. Extensive ablation studies demonstrate the complementary contributions of geometric features and attention, while outperforming geometry-only regression baselines. In addition, an auxiliary Paris classification task provides complementary morphological characterization. The framework further enables geometry-driven analysis and 3D visualization, offering intuitive representations of polyp structure. By combining accurate polyp size estimation with morphological classification and 3D visualization, this work enables comprehensive polyp characterization that is both clinically relevant and meaningful, supporting the development of next-generation computer-aided tools for endoscopic analysis.