11/06/2025
By Alimire Nabijiang
The Kennedy College of Science, Richard A. Miner School of Computer & Information Sciences, invites you to attend a doctoral dissertation defense by Alimire Nabijiang titled, "Deep Learning Pipelines for Polyp Size Estimation and Monocular Depth Enhancement in Colonoscopy."
Candidate Name: Alimire Nabijiang
Date: Thursday, November 20, 2025
Time: 9 - 10 a.m. EST
Location: This will be a virtual defense via Zoom.
Committee Members:
- Yu Cao (Advisor), Professor, Director, Miner School of Computer & Information Sciences, UMass Center for Digital Health (CDH)
- Benyuan Liu (Advisor), Professor, Director, Miner School of Computer & Information Sciences, UMass Center for Digital Health (CDH), Computer Networking Lab, CHORDS
- Hengyong Yu (Member), Professor, FIEEE, FAAPM, FAIMBE, FAAIA, FAIIA, Department of Electrical and Computer Engineering
- QiLei Chen (Member), Research scientist, Miner School of Computer & Information Sciences
Abstract:
Colorectal cancer (CRC) remains a major global health burden, and early detection through colonoscopy plays a critical role in reducing its incidence and mortality. Polyp size is a key clinical parameter influencing surveillance intervals, therapeutic strategies, and long-term follow-up. In current practice, size assessment is primarily limited to linear metrics, particularly maximum diameter. While alternative metrics such as surface area and volume offer richer characterizations of polyp morphology, they remain under-explored in clinical research. Diameter-based measurement is also highly subjective and often inaccurate, particularly when it is estimated visually during procedures. These limitations underscore the need for deep learning-based methods that can deliver automated, objective and clinically practical polyp sizing solutions. Central to such methods is metric monocular depth estimation, which powers polyp sizing and other downstream tasks; however, achieving reliable metric depth on real colonoscopic imagery remains challenging. This thesis focus on developing deep learning methods that improve polyp size estimation and enhance monocular depth estimation for colonoscopic imagery.
The first contribution of this research introduces a novel pipeline for estimating polyp surface area from monocular colonoscopic images. Conventional diameter-based methods fail to represent complex, protuberant 3-D geometries. Our approach combines a novel canonical camera space transformed metric depth estimation network, robust segmentation, and a Poisson surface reconstruction algorithm to generate 3D surface models from a monocular frame and compute the lesion’s surface area. While the use of surface area as a metric is still theoretical, our results on synthetic datasets demonstrate the technical feasibility of our approach and lay the groundwork for future clinical applications in which surface area could complement existing sizing metrics.
The second contribution presents a practical and clinically aligned pipeline for estimating polyp diameter. We develop a robust end-to-end pipeline that segments the polyp and applies ViTCAN-Depth, a novel monocular depth model that fuses parallel CNN and Vision Transformer encoders via channel-attention gating. This is coupled with a novel Depth–Pixel Linear (DPL), a lightweight module that estimates polyp diameter in real-world units using a learned scalar, thereby eliminating the need for manual calibration or reference tools. Quantitative and qualitative evaluations on synthetic and real colonoscopy frames show that our approach outperforms existing methods and maintains strong performance across varied polyps and temporal variations.
Finally, we develop a two‑phase, teacher–student, semi‑supervised framework that advances monocular depth estimation for real‑time colonoscopy by balancing accuracy and efficiency. A high‑capacity teacher—trained on synthetic colonoscopy frames with ground‑truth depth—generates pseudo‑labels for large‑scale, unlabeled clinical images. A lightweight student with a DINOv2 encoder and a multi‑scale fusion decoder is then optimized using our Unified Knowledge Distillation (UKD) loss comprising a whole‑image SiLog term that anchors the global metric scale and an edge‑guided, patch‑wise SiLog term that preserves clinically salient local structure. By leveraging abundant unlabeled data together with this well‑designed objective, the framework enables a compact model to achieve real‑time inference with competitive metric accuracy, effectively balancing fine‑detail preservation and global scale consistency for clinical application.
Collectively, these contributions aim to develop more accurate, efficient, and clinically meaningful solutions for polyp size and depth estimation in colonoscopy, paving the way for real-time, AI-assisted diagnostic tools and future clinical applications.