11/15/2021
By Xuanming Liu

The Kennedy College of Sciences, Department of Computer Science, invites you to doctoral thesis defense by Xuanming Liu on "Hot and Dense Items: Sketch and Stochastic Model Approaches for Fast Streaming Data and Graphs."

Ph.D. Candidate: Xuanming Liu
Date: Monday, Nov. 29, 2021
Time: 9:30 a.m.
Location: This will be a virtual defense via Zoom.

Committee Members:

  • Tingjian Ge (advisor), Professor, Department of Computer Science
  • Cindy Chen (member), Associate Professor, Department of Computer Science
  • Yinghui Wu (member), Associate Professor, Department of Computer & Data Sciences, Case Western Reserve University

Abstract:
Data streams in general, and graph streams in particular, are ubiquitous in data analysis in various domains. This thesis specifically investigates three types of analytical queries: (1) top-k most frequent items in data streams in general, (2) densest lasting subgraph queries in dynamic graph streams, and (3) predictive relationship queries in knowledge graph streams. These together cover some of the most important queries—heavy-hitters and predictions—for streams.

In data streams, a common heavy-hitter query is continuously querying top-k most frequent items in sliding windows of any size. We devise a succinct probabilistic data structure, a Floating Top-k pool, to retrieve the top-k items dynamically. We prove that its space and time costs grow only logarithmically with the window size, in addition to its accuracy guarantees. An analogous counterpart of heavy-hitters in graph streams is to find the densest lasting subgraphs, which has wide applications in telecommunication hot-spot detection, road traffic control, spam network filtering, and dynamic community detection. We propose a framework called Expectation-Maximization with Utility functions (EMU), a novel stochastic approach that nontrivially extends the conventional EM. EMU has the flexibility of optimizing any user-defined utility functions. We validated our EMU approach by showing that it converges to the optimum — by proving that it is a specification of the general Minorization-Maximization (MM) framework with convergence guarantees. We then devise EMU algorithms for the densest lasting subgraph problem. Using real-world graph data, we experimentally verified the effectiveness and efficiency of our techniques, and compared with two prior approaches on dense subgraph detection.

Infinite graph streams naturally extend to the future. So last but not least, we study relational machine learning for predictive relationship queries in knowledge graph streams, such as those in dynamic recommendation and traffic prediction. We devise a Count-Fading sketch data structure, as well as an online incremental graph embedding algorithm, to tackle the challenges of heterogeneous relations, complex topological and temporal correlations, and high rates of data and pattern changes.