03/17/2021
By Ying Li

Title: Deep Learning for Medical Video Analysis and Understanding

Ph.D. Candidate: Ying Li
Time: Friday, April 9, 2020, 9 a.m.
Location: This will be a virtual defense via Zoom.

Committee Members:

  • Yu Cao (advisor), Professor, Computer Science Department, University of Massachusetts Lowell
  • Benyuan Liu (advisor), Professor, Computer Science Department, University of Massachusetts Lowell
  • Yan Luo, Professor, Electrical & Computer Engineering Department, University of Massachusetts Lowell

Abstract:

Medical video and image analysis and understanding is important yet challenging. This is not only because of the complexity of the medical domain knowledge itself in understanding the manifestations, but also due to the big variety across the different input modalities with vastly different characteristics, which include the X-ray images, computed tomography (CT) scans, magnetic resonance imaging (MRI), and RGB videos, etc. Each type of images and videos may require special design both in network structure and training strategy. To address these challenges, we first collaborate with experts in target special areas, and then focus on the tasks with images or videos from RGB cameras.

In this dissertation, we focus on the following two tasks: 1) human pose estimation based in-home lower body rehabilitation system, which can help the patients to finish their physical therapy activities in home by themselves instead of under the guidance of physical therapist in clinic; 2) ileocecal valve detection in colonoscopy videos in real time, which can be used to evaluate the start time of withdraw from colonoscopy videos, and is important to determine the adenoma detection rate (ADR) and polyp detection rate (PDR). And they are important quality indicators of a colonoscopy.

For the first task, we design, develop, and evaluate an in-home lower body rehabilitation system based on a novel lightweight human pose estimation model. To achieve that, we first create a lower body rehabilitation dataset of 500,000 images with each image annotated with the ground truth joint point locations. After that, we design a lightweight but powerful neural network model, which runs on a smartphone, to estimate human pose. Furthermore, we develop a series of principles for evaluating in-home rehabilitation activities of patients in terms of the range of motion and duration of activities. For the concern of privacy, all the data collected from patients are encrypted, stored and processed locally on patients' own smartphones.

For the second task, we present a colonoscopy video processing system that detects the ileocecal valve in real time by using a convolutional neural network. We collect a novel dataset of colonoscopy images and videos to train and evaluate the classifier. We explore a range of state-of-the-art classifier architectures, and our best model achieves 99.6% accuracy on the image-level dataset.