10/11/2022
By Hossein Haeri
The Francis College of Engineering, Department of Mechanical Engineering, invites you to attend a doctoral proposal defense by Hossein Haeri on “Recency of Temporal Data: When do data expire?”
Ph.D. Candidate: Hossein Haeri
Defense Date: Thursday, Oct. 24, 2022
Time: 1 ro 3 p.m.
Location: Virtual (Zoom meeting ID: 922 5924 3183).
Thesis/Dissertation Title: Recency of Temporal Data: When do data expire?
Committee Chair (Advisor): Kshitij Jerath, Department of Mechanical Engineering, University of Massachusetts Lowell
Committee Members:
- Christopher Niezrecki, Department of Mechanical Engineering, University of Massachusetts Lowell
- Craig Beal, Department of Mechanical Engineering, Bucknell University
- Hadi Amiri, Department of Computer Science, University of Massachusetts Lowell
- Bartosz Krawczyk, Department of Computer Science, Virginia Commonwealth University
Abstract:
Conventional data-driven predictors and estimators assume that data never expires, i.e., even very `old' data can help in making better predictions. Nevertheless, many real-world data sets are obtained from large-scale complex systems which may exhibit unforeseen behaviors that cause data streams (or their distributions) to constantly change over time, a notion that is known as concept drift. In such a scenario, the model needs to constantly adapt itself to the `new' information, but new data by itself may not be rich enough to produce an effective model.
In this work, we propose methods and strategies that adaptively and systematically determine the optimal scale in time – what we refer to as ‘characteristic timescale’ -- at which data or measurements may be considered as 'recent', i.e., they lead to optimal estimation and/or prediction. Specifically, we adopt a multi-scale analysis approach and use Allan Variance as a measure of the recency of the measurements and test it in a discrete-time parameter estimation problem. Our preliminary results show Allan Variance can optimally identify the characteristic timescale of the data for moving average estimation tasks and outperforms other state-of-the-art adaptive windowing approaches. We also extended the original definition of the Allan Variance to determine the characteristic timescale for a continuous-time parameter estimation problem. We propose to expand the multi-scale analysis approach and deploy statistical tools such as KL divergence to determine the characteristic timescale of data sets. These novel approaches will be evaluated in both supervised and unsupervised online learning scenarios. This work is motivated by the need to accurately predict road friction in safety-critical scenarios. However, our contribution will more broadly empower engineers and scientists trying to build adaptive data-driven estimators; especially those who want to model complex systems such as large-scale cyber-physical systems, social behaviors, biological systems, and socioeconomic processes in real-time.