11/23/2022
By Chenxi Zhang

Title: Deep learning for automatic endoscope assist system, from disease detection to quality control

Ph.D. Candidate: Chenxi Zhang
Time: Tuesday, November 29, 2022, 10 a.m. Eastern Time
Location: This will be a virtual defense via Zoom

Committee Members:
Yu Cao, Ph.D., (advisor), Professor, Director, UMass Center for Digital Health (CDH), Miner School of Computer & Information Sciences
Benyuan Liu, Ph.D., (advisor), Professor, Director, Miner School of Computer & Information Sciences
Mohammad Arif Ul Alam, Ph.D., Assistant Professor, Miner School of Computer & Information Sciences
Heyong Yu, Ph.D. (member), Professor, Department of Electrical & Computer Engineering

Abstract:
The past decade has witnessed the rise of Convolutional Neural Networks (CNNs) in deep learning. CNN models have been successful in a variety of general application domains such as Computer Vision (CV), Natural Language Processing (NLP), Automatic Speech Recognition (ASR), etc. As of 2017, the transformer-based model has dominated Natural Language Processing, and the vision transformer is even capable of outperforming CNN for Computer vision tasks. The outstanding achievements of CNN and transformer indicate their potential in the field of health care. For example, endoscopic surgery has been an important application scenario for the CNN-based deep learning method. The purpose of this dissertation is to explore in detail CNN-based and transformer-based systems that are used in various aspects of endoscopic surgery, including algorithms for quality control in colonoscopy and disease classification algorithms that assist physicians in improving diagnosis.

Over one-third of the world's population suffers from digestive tract diseases, ranging from gastric erosion, ulcers, and intestinal polyps to severe illnesses like cancer. While most diseases are mild, severe cases can still develop into cancer. Therefore, early detection and treatment of all gastrointestinal diseases are critical. Endoscopy is a standard screening procedure in which a camera with an attached rubber tube is inserted down the patient's digestive tract to visualize his/her digestive system (upper and lower). However, the procedure requires manual inspection of the video feed, which can result in human errors due to fatigue and doctors' lack of experience. The ultimate goal of our project is to develop a system that will assist physicians in improving their surgical skills, diagnostic precision, and efficiency during surgery.

In order for an operation to be performed to the highest standard, the doctor must pay attention to many operational details throughout the entire process. For instance, a high-quality endoscopy procedure maybe compromised by the presence of various types of artifacts during the procedure, such as pixel saturation, motion blur, defocus, specular reflections, bubbles, fluid, and debris. These artifacts not only complicate the examination of the underlying tissues during diagnosis but also affect the post-analysis methods that are needed for follow-ups (retrieval of video frames for reporting). In the same way, a good gastroscopy screening should include clear footage of all relevant areas (esophagus, gastric body, gastric antrum, duodenum, descending duodenum, gastric angle, gastric fundus) in order to prevent any underlying disease from being missed. Other things like pre-surgery equipment checks also play an important role since a functioning device is a prerequisite for a successful surgery. Unfortunately, despite doctors' best efforts, there is still the possibility of some lesions being missed due to various factors.


In order to address the problems mentioned earlier, we explored a variety of deep-learning methods. For example, artifacts are detected using an object detection algorithm. In particular, we studied the advantages and disadvantages of SSD (single shot multibox detector) and Faster-RCNN, two popular object detection models that belong to single-stage and two-stage approaches, respectively. Additionally, to ensure that the doctor covers all areas of the upper gastrointestinal system in gastrology, we developed an image classification algorithm that indicates the camera's location at the time of the examination. Our upper gastrointestinal location classifier includes 12 classes: upper and lower esophagus, upper, middle, and lower gastric body, gastric antrum, duodenum, descending duodenum, gastric angle, upper and lower gastric fundus, and other backgrounds. Lastly, we create a classifier for detecting common gastric diseases such as ulcers and erosion, which can be used to improve diagnosis by determining whether an image contains gastric diseases. Data for those classifiers were collected from patient records and surgical videos and labeled by experts, and we conducted both CNN and transformer-based models for our classifiers.