06/29/2022
By Sokny Long

The Francis College of Engineering, Department of Electrical & Computer Engineering, invites you to attend a doctoral dissertation defense by Chenxi Wang on “Deep Learning Approaches for Pose Estimation and Analysis.”

Ph.D. Candidate: Chenxi Wang
Defense Date: Thursday, July 7, 2022
Time: 10 a.m. to noon EST
Location: This will be a virtual defense via Zoom. Those interested in attending should contact chenxi_wang1@student.uml.edu and committee advisor, yan_luo@uml.edu, at least 24 hours prior to the defense to request access to the meeting.

Committee Chair (Advisor): Yan Luo, Ph.D., Professor, Department of Electrical and Computer Engineering

Committee Members:

  • Hengyong Yu, Ph.D., Professor, Electrical and Computer Engineering, UMass Lowell
  • Seung Woo Son, Ph.D., Professor, Electrical and Computer Engineering, UMass Lowell
  • Yu Cao, Ph.D., Professor, Computer Science, UMass Lowell

Brief Abstract:

Over the past decade, most research in computer vision has emphasized the use of deep learning because of its exceptional performance, especially for the research field of pose estimation (PE). As one of the greatest challenges in the field of computer vision, the objective of PE is locating the body keypoints in an image or video. Although a few open datasets have emerged to facilitate the evaluation of pose detection methods, they are too generic to benefit domain specific applications such as physical therapy which has quantitative clinical metrics and requires precise differentiation and measurement. To address the issue, we design, develop and evaluate a lightweight lower body rehabilitation system based on HRNet. It achieves competitive performance with the state-of-the-art methods with much fewer parameters and less computations cost. Moreover, we construct the first keypoints detection dataset for physical therapy, in particular lower body rehabilitation.

Furthermore, due to the success of self-attention mechanism in natural language processing (NLP), a plethora of studies implement the transformer architecture in various computer vision tasks. In order to explore the capability of transformers, we propose a novel model based on transformer enhanced with pyramid feature fusion structure. We use pre-trained Swin Transformer to extract features, and leverage a feature pyramid structure to extract and fuse feature maps from different stages. According to the results of our experiment, the transformer-based model is superior to the state-of-the-art CNN-based model in terms of performance.

Additionally, taking advantage of transformer architecture and Convolutional Neural Networks (CNNs), we proposed a blended approach, which captures the long-range spatial dependencies simultaneously and fuses them with the extracted local features from the input images. The proposed approach can precisely predict the positions of keypoints and outperform the mainstream convolutional neural network architectures on Microsoft COCO2017 dataset.

All interested students and faculty members are invited to attend the online defense via remote access.