08/09/2021
By Sokny Long

The Francis College of Engineering, Department of Electrical and Computer Engineering, invites you to attend a doctoral dissertation defense by Xiaoyan Zhuo on “Anomaly Detection with Data Efficient Representation Learning.”

Ph.D. Candidate: Xiaoyan Zhuo
Defense Date: Friday, Aug. 20, 2021
Time: 10 a.m. to noon EST
Location: This will be a virtual defense via Zoom. Those interested in attending should contact Xiaoyan_Zhuo@student.uml.edu, and committee advisor, SeungWoo_Son@uml.edu, at least 24 hours prior to the defense to request access to the meeting.

Committee Chair (Advisor): Seung Woo Son, Associate Professor, Electrical and Computer Engineering, University of Massachusetts Lowell

Committee Members:

  • Hengyong Yu, Professor, Electrical and Computer Engineering, University of Massachusetts Lowell
  • Yan Luo, Professor, Electrical and Computer Engineering, University of Massachusetts Lowell

Brief Abstract:
Anomaly detection refers to identify items, events, or observations which do not conform to an expected pattern or a well-defined notion of normal behavior. In recent years, numerous machine learning (ML) and deep learning models have been proposed and achieved remarkable improvements in the performance of anomaly detection. There are, however, several factors that still make current anomaly detection tasks challenging. First, a large volume of complex data, high-dimensional data are rapidly generated and collected. Second, massive data are coming in a stream fashion, which requires timely and efficient online analysis. Lastly, there lacks of properly labeled data available in real-world applications where current state-of-the-art ML-based anomaly detection models learn to classify normal and anomalous behaviors from a large amount of labeled data.

To tackle these challenges, we propose data-efficient representation learning techniques for anomaly detection. First, we present a case study where word embedding models are employed to convert a high number of non-textual network log data into a smaller set of semantic features. We also propose a novel approach based on a sparsity profile for detecting anomalies in stream data. We further propose a cascaded dimension reduction technique to extract the most essential information from high-dimensional data with a much smaller set of features. Lastly, to address the problem of rare labels available, we propose a distribution-aware pseudo labeling method that incorporates t-distribution confidence interval and adaptive training strategies to obtain more pseudo-labels with high confidence. Our extensive experiments on various real-world datasets have demonstrated that our proposed methods can effectively detect anomalies using only 2%-10% of original data or fully-labeled data, and outperforms state-of-the-art anomaly detection techniques.

All interested students and faculty members are invited to attend the online defense via remote access.