06/28/2022
By Yan Luo

The Francis College of Engineering, Department of Electrical & Computer Engineering, invites you to attend a doctoral dissertation defense by Chenxi Wang on “Deep Learning Approaches for Pose Estimation and Analysis.”

Candidate Name: Chenxi Wang
Degree: PhD in Computer Engineering
Defense Date: Thursday, July 7, 2022
Time : 10 a.m. to noon EDT
Location: This will be a virtual defense via Zoom. Those interested in attending should contact the student (chenxi_wang1@student.uml.edu) and committee advisor (yan_luo@uml.edu) at least 24 hours prior to the defense to request access to the meeting.

Committee:
Advisor: Yan Luo, Ph.D., Professor, Department of Electrical and Computer Engineering

Committee Members

  • Hengyong Yu, Ph.D., Professor, Department of Electrical and Computer Engineering
  • Seung Woo Son, Ph.D., Professor, Department of Electrical and Computer Engineering
  • Yu Cao, Ph.D., Professor, Department of Computer Science

Brief Abstract:

Over the past decade, most research in computer vision has emphasized the use of deep learning because of its exceptional performance, especially for the research field of pose estimation (PE). As one of the greatest challenges in the field of computer vision, the objective of PE is locating the body keypoints in an image or video. Although a few open datasets have emerged to facilitate the evaluation of pose detection methods, they are too generic to benefit domain specific applications such as physical therapy which has quantitative clinical metrics and requires precise differentiation and measurement. To address the issue, we design, develop and evaluate a lightweight lower body rehabilitation system based on HRNet. It achieves competitive performance with the state-of-the-art methods with much fewer parameters and less computations cost. Moreover, we construct the first keypoints detection dataset for physical therapy, in particular lower body rehabilitation.

Furthermore, due to the success of self-attention mechanism in natural language processing (NLP), a plethora of studies implement the transformer architecture in various computer vision tasks. In order to explore the capability of transformers, we propose a novel model based on transformer enhanced with pyramid feature fusion structure. We use pre-trained Swin Transformer to extract features, and leverage a feature pyramid structure to extract and fuse feature maps from different stages. According to the results of our experiment, the transformer-based model is superior to the state-of-the-art CNN-based model in terms of performance.

Additionally, taking advantage of transformer architecture and Convolutional Neural Networks (CNNs), we proposed a blended approach, which captures the long-range spatial dependencies simultaneously and fuses them with the extracted local features from the input images. The proposed approach can precisely predict the positions of keypoints and outperform the mainstream convolutional neural network architectures on Microsoft COCO2017 dataset.