Spring 2026

Knowledge is Power, But Power Casts Shadows?

Date: 

Location:Join via Zoom for “Knowledge is Power, But Power Casts Shadows?” (January 23, 2026)
Speaker: Yuji Zhang
Affiliation: University of Illinois Urbana-Champaign
Host: Hong Yu

Abstract

Knowledge is a prerequisite for intelligent behavior. Although modern language models acquire substantial knowledge and can leverage it to solve tasks with superior performance, the underlying mechanisms often remain opaque and fragile. This talk presents knowledgeable foundation models for robust intelligence and asks: What does the model know? When and why does it fail? How can we update it with minimal trade-off? And how do we use it to reason and ultimately to optimize decisions?

First, we make model knowledge explicit and testable and investigate why unreliable knowledge emerges, covering hallucination and staleness. We identify knowledge overshadowing, interactions among individually correct pieces of knowledge that are miscomposed and trigger hallucinations, and we quantify and even foresee such failures to gain greater model controllability and robustness. These diagnoses drive targeted, localized repairs that bound side effects. Second, we operationalize knowledge as interpretable, composable atomic skills, enabling modular reasoning that strengthens generalization and robustness. Finally, we translate interpretable knowledge into decision value by aligning reasoning with downstream utility.


Toward Automatic Compiler Verification for Modular Metatheory

Date: 

Location: Dandeneau Hall (Room 321)
Speaker: Dustin Jamner
Affiliation: Massachusetts Institute of Technology (MIT)
Host: Anitha Gollamudi

Abstract

Common techniques for semantic reasoning are often tied to the specific structures of the languages and compilers that they support. This limitation restricts the extension and composition of semantics in these systems. Additionally, mechanized proofs are usually either highly manual or use similarly bespoke automation. As a result of these two limitations, existing compiler verification efforts do not admit critical modes of extension or reusability.

In this talk, I will present Pyrosome, a framework for modular compiler verification that embodies a novel approach to extensible semantics and compilation, implemented in Rocq. As the second part of this talk, I will present ongoing work on language-agnostic tooling to fully automate compiler verification using e-graphs. To demonstrate Pyrosome, I will present components of a verified multipass compiler for System F with simple references that performs CPS translation and closure conversion.


From Large Language Models to Large Agent Models: Reasoning with the World

Date: 

Location: Join via Zoom for “From Large Language Models to Large Agent Models: Reasoning with the World” (February 6, 2026)
Speaker: Manling Li
Affiliation: Northwestern University
Host: Hong Yu

Abstract

The leap from Large Language Models to Large Agent Models is to unfold reasoning as multi-turn interactions with the world. We take the first step by formalizing agent training as a Markov Decision Process (MDP), introducing an agent reasoning interface (RAGEN) to avoid “reasoning collapse”, where agents loop into repetitive reasoning and fail to explore. Extending to Partially Observable MDPs (POMDPs), we propose VAGEN, training agents to internalize world models for state estimation (“what is seen”) and transition modeling (“what comes next”).

We further show how divergence between Pass@1 and Pass@K reveals shallow exploitation over true exploration, and use self-play to inject world model knowledge to diversify exploration. Finally, we introduce cognitive maps as structured reasoning interfaces that integrate partial observations into coherent world beliefs. In the end, we lay out how these advances chart a path to agents that simulate, explore, and actively construct internal world models—a decisive step from LLMs to LAMs.


Understanding LLM Capabilities on Large-scale Multilingual Real-World Clinical Data

Date: 

Location: Join via Zoom for “Understanding LLM Capabilities on Large-scale Multilingual Real-World Clinical Data” (February 13, 2026)
Speaker: Jie Yang
Affiliation: Harvard University
Host: Hong Yu

Abstract

Large language models (LLMs) are increasingly used in healthcare, yet most evaluations rely on clean, exam-style datasets that fail to capture the complexity of real-world clinical data, such as electronic health records (EHRs), and rarely keep pace with rapidly evolving LLMs. In this talk, we will present BRIDGE, a multilingual benchmark constructed from real clinical tasks and over one million EHR-derived samples, where we evaluated 95 leading LLMs through 24,000+ experiments and 39 million predictions. Our findings show wide variation across model families, tasks, and languages, with several open-source models matching proprietary ones. We also observe that chain-of-thought prompting often lowers accuracy for these clinical tasks, and we provide the first large-scale analysis of stigmatized language generated during model reasoning.


Beyond the Patient Portal: Generative AI for Lab Result Interpretation and Empowerment

Date: 

Location: Join via Zoom for “Beyond the Patient Portal: Generative AI for Lab Result Interpretation and Empowerment” (February 20, 2026)
Speaker: Zhe He
Affiliation: Florida state university
Host: Hong Yu

Abstract

This talk presents the data science foundations of the LabGenie project, focusing on the evaluation of large language models (LLMs) for laboratory test interpretation. I will discuss our empirical studies examining how LLMs answer lab-related questions and the challenges they face in numerical accuracy and contextual understanding. I will also highlight our findings on the role of lab data in improving LLM-based differential diagnosis. Finally, I will introduce LabQAR, a curated benchmark dataset for evaluating LLM performance on lab reference ranges and result interpretation. Together, this work positions LabGenie as a research testbed for advancing clinically grounded and trustworthy AI in laboratory medicine.


View past Computer Science colloquia talks.