11/03/2025
By Xingyu Lyu
The Kennedy College of Science, Richard A. Miner School of Computer & Information Sciences, invites you to attend a doctoral dissertation defense by Xingyu Lyu titled, "Toward Trustworthy Machine Learning Systems: Federated Learning and Large Models in Adversarial Settings."
Candidate Name: Xingyu Lyu
Date: Friday, November 14, 2025
Time: 10–11 a.m. EST
Location: This will be a virtual defense via Zoom.
Dissertation Committee:
- Ian Chen (Advisor), Assistant Professor, Miner School of Computer & Information Sciences, University of Massachusetts Lowell
- Benyuan Liu, Professor, Miner School of Computer & Information Sciences; UMass Center for Digital Health (CDH); Computer Networking Lab, CHORDS, University of Massachusetts Lowell
- Ning (Nicole) Wang, Assistant Professor, Department of Computer Science and Engineering, University of South Florida (USF)
- Xinwen Fu — Professor and Director, iSAFER Center, Miner School of Computer & Information Sciences, Kennedy College of Sciences, University of Massachusetts Lowell
Abstract
Federated Learning (FL) enables distributed clients to collaboratively train a global model without sharing raw data, yet its decentralized nature exposes it to adversarial risks. This dissertation systematically investigates attacks and defenses across both federated and large language model (LLM) ecosystems, aiming to advance trustworthy and privacy-preserving machine learning.
We first explore vulnerabilities in wireless FL client selection. Existing strategies rely heavily on channel conditions but overlook the risk of channel state information (CSI) forgery. To address this, we propose AirTrojan, a novel attack that manipulates client selection probabilities to facilitate model poisoning, exposing fundamental weaknesses in wireless FL systems.
Next, we present FLBuff, a defense framework that introduces a supervised contrastive buffer layer between benign and malicious updates, enhancing robustness against backdoor attacks under diverse non-IID data distributions. Building on this foundation, GeminiGuard is proposed as an unsupervised defense leveraging dynamic clustering and multi-layer trust scoring to counter model poisoning attacks, substantially outperforming state-of-the-art methods in non-IID environments.
Beyond FL, we extend our investigation to privacy risks in Retrieval-Augmented Generation (RAG) and LLM-based agents. We develop ALDEN, an active-learning–driven data extraction attack that efficiently infers latent topic distributions to boost private data recovery from RAG systems. We further introduce ADAM, a systematic memory extraction attack on LLM agents that employs entropy-guided adaptive querying to achieve near-perfect data leakage. To mitigate these threats, we propose RAG-CT, a lightweight defense that scans query entropy and margin distributions to detect and block malicious prompts, effectively reducing personally identifiable information (PII) leakage without altering the underlying LLM or retriever. Finally, we outline several ongoing and future research directions focused on strengthening defense mechanisms for agent-based systems and extending trustworthy learning to broader AI ecosystems.
Collectively, these contributions provide a unified understanding of robustness and privacy vulnerabilities in decentralized and retrieval-augmented learning frameworks, offering new insights and practical mechanisms toward secure, resilient, and trustworthy AI systems.