07/03/2024
By Hossein Haeri

The Francis College of Engineering, Department of Mechanical and Industrial Engineering, invites you to attend a doctoral dissertation defense by Hossein Haeri on "Determining Temporal Validity of Data in Stream Learning under Concept Drift."

Defense Date: July 18, 2024
Time: 9 - 11 a.m.
Location: Southwick Hall, SOU240

Advisor: Kshitij Jerath

Committee Members:

  • Christopher Niezrecki (UMass Lowell MIE Department)
  • Hadi Amiri (UMass Lowell CS Department)
  • Craig .C Beal (Bucknell University ME Department)

Title: Determining Temporal Validity of Data in Stream Learning under Concept Drift

Abstract

In the era of big data, the ability to efficiently manage and analyze continuous data streams is critical for applications ranging from autonomous navigation systems to financial trading platforms. This dissertation addresses a fundamental challenge in stream learning: determining the temporal validity (or expiry date) of data in the presence of concept drift. Concept drift, the phenomenon where the statistical properties of a target variable change over time, poses significant hurdles for maintaining model accuracy in real-time, dynamic environments. This thesis is primarily motivated by the problem of predicting road friction conditions using collective vehicular safety-critical measurements, enabling drivers—whether in autonomous or human-driven vehicles—to navigate roads more safely and efficiently.

This research first explores innovative methodologies for estimating the optimal data retention period in stream learning models. The study introduces a novel Allan variance-based technique for moving average estimation, optimizing the window size to balance noise reduction and responsiveness to genuine data changes. By systematically analyzing the effects of non-stationary noise and continuously drifting concepts, the dissertation provides a framework for enhancing the adaptability of machine learning models in non-stationary environments.

The dissertation also proposes the Multi-Scale Model Stability Analysis (MSMSA) method. MSMSA is designed to estimate the validity horizon of data in stream learning regression tasks, accommodating continuous concept drift and varying noise characteristics. This method allows for the precise determination of the temporal boundary within which data remains relevant, thereby improving the accuracy and efficiency of real-time predictive models.

Moreover, the research introduces a strategy for regional concept drift adaptation, which addresses the challenge of localized drifts within specific regions of the feature space. This approach ensures that the model retains relevant data while discarding outdated information, thereby optimizing model performance without increasing complexity.

The methodologies proposed in this dissertation are validated through rigorous experimentation using both synthetic and real-world data streams. Results demonstrate significant improvements in model stability and prediction accuracy, highlighting the practical applicability of these techniques in various domains, including urban traffic management and financial forecasting.