01/15/2021
By Sokny Long

The Francis College of Engineering, Department of Electrical and Computer Engineering, invites you to attend a doctoral proposal defense by Xiaoyan Zhuo on “Anomaly Detection with Data Efficient Representation Learning.”

PhD Candidate: Xiaoyan Zhuo
Defense Date: Friday, Jan. 29, 2021
Time: 10:30 a.m. to noon EST
Location: This will be a virtual defense via Zoom. Those interested in attending should contact Xiaoyan_Zhuo@student.uml.edu, and committee advisor, SeungWoo_Son@uml.edu, at least 24 hours prior to the defense to request access to the meeting.

Committee Members:

  • Chair (Advisor): Seung Woo Son, Associate Professor, Electrical and Computer Engineering, University of Massachusetts Lowell
  • Hengyong Yu, Professor, Electrical and Computer Engineering, University of Massachusetts Lowell
  • Yan Luo, Professor, Electrical and Computer Engineering, University of Massachusetts Lowell

Abstract:
Anomaly detection refers to identify items, events, or observations which do not conform to an expected pattern or a well-defined notion of normal behavior. Detecting outliers or anomalies in data has been studied for decades in diverse research areas and application contexts such as fraud detection, network intrusion detection, lesion identification medical image, and manufacturing defect detection. In recent years, numerous machine learning (ML) and deep learning (DL) models in Natural Language Processing (NLP) and Computer Vision (CV) tasks, such as text data embedding, time series prediction, image classification or object detection, have been proposed and achieved remarkable improvements in the performance of anomaly detection. However, several factors make current anomaly detection tasks still challenging. First, huge volume of complex structured and unstructured data are rapidly generated and collected. Second, massive data are coming in a stream fashion, which requires timely and efficient online analysis. Lastly, there are rare properly labeled data available in real-world applications where current state-of-the-art ML-based anomaly detection models learn to classify normal and anomalous behaviors from a large amount of labeled data.

In this work, we first present a case study where word embedding models, a feature learning technique in NLP, are employed on non-textual data for network intrusion detection. The proposed method can convert a high number of non-textual network log data into a smaller set of semantic features and achieve F1 score of 0.95 for anomaly detection. We also propose a novel approach based on sparsity profile for detecting anomalies in stream data. Our experimental evaluation using real-world sensor network datasets demonstrates that our proposed method can detect 83%–92% of anomalies using only 1.7% of the original data. Furthermore, to address the problem of rare proper labels available, we proposed methods based on weakly/semi-supervised learning to leverage weak labels and large amount of unlabeled data. We evaluate our proposed methods for real-world datasets: lesion identification in medical images and defect detection in manufacturing datasets.

All interested students and faculty members are invited to attend the online defense via remote access.