04/02/2021
By Lorina Sinanaj
The Kennedy College of Sciences, Department of Computer Science, invites you to attend a Master's thesis defense by Lorina Sinanaj on “Allan Variance-based Granulation Technique for Large Temporal Databases.”
Candidate Name: Lorina Sinanaj
Degree: Master’s
Defense Date: Thursday, April, 15, 2021
Time: 3-4:30 p.m. EST
Location: This will be a virtual defense via Zoom. Those interested in attending please contact student lorina_sinanaj@student.uml.edu at least 24 hours before the defense to request access to the meeting.
Thesis Title: “Allan Variance-based Granulation Technique for Large Temporal Databases”
Advisor: Cindy Chen, Department of Computer Science, University of Massachusetts Lowell
Committee Members:
- Benyuan Liu, Department of Computer Science, University of Massachusetts Lowell
- Kshitij Jerath, Department of Mechanical Engineering, University of Massachusetts Lowell
Brief Abstract:
In the era of Big Data, with the dramatic rise in the amount of data, conducting complex data analysis tasks efficiently becomes increasingly important and challenging. In order to decrease query response time with limited main memory and storage space, data reduction techniques that preserve data quality are needed. Existing data reduction techniques, however, are often computationally expensive and rely on heuristics for deciding how to split or reduce the original dataset.
In this thesis, we propose an effective granular data reduction technique for temporal databases, based on Allan Variance (AVAR). AVAR is used to systematically determine the temporal window length over which data remains relevant. The entire dataset to be reduced is then separated into granules with size equal to the AVAR-determined window length. Data reduction is achieved by generating aggregated information for each such granule. The proposed method is tested using a large database that contains temporal information for vehicular data. The performance results demonstrate that the proposed Allan Variance-based technique can efficiently generate reduced representation of the original data without losing data quality, while significantly reducing computation time.