Fall 2025
Large Language Models for Biomedical Applications
Date: September 12, 2025
Time: 11 a.m.
Location: Zoom
Speaker: Hua Xu, PhD, FACMI
Affiliation: Yale School of Medicine
Host: Hong Yu
Abstract
Recent breakthroughs in Large Language Models (LLMs) have sparked an unprecedented transformation across diverse fields, particularly in the biomedical domain. These advanced models are not only accelerating research but also revolutionizing clinical practices by enabling more efficient data analysis, improving decision-making processes, and facilitating innovative discoveries. In this talk, I'll share our cutting-edge methodologies and practical software solutions leveraging state-of-the-art LLMs such as GPT and LLaMA. We will highlight their applications in real world evidence generation, medical diagnoses, and literature-based discovery. Additionally, we will discuss the compelling insights, challenges, and real-world experiences gained from applying these transformative technologies, illustrating how LLMs are reshaping the future of biomedical research and healthcare.
LLMs in Education: Integrating Code and Text Generation Models in Educational Applications
Date: September 19, 2025
Time: 11 a.m.
Location: Zoom
Speaker: Marcos Zampieri
Affiliation: George Mason University
Host: Hadi Amiri
Abstract
Recent advances in Generative AI and Large Language Models (LLMs) have the potential to transform education. LLMs are becoming an important part of various educational applications including intelligent tutoring systems capable of handling text, images, and programming code. In this talk, I present recent work on the use of LLMs in education. The talk is divided into two parts. In the first part, I present research on LLMs in Computer Science (CS) education. In particular, I describe use cases of LLMs in CS education and code generation including recent benchmark work on introductory programming assignments and low-resource programming languages. In the second part of this talk, I describe LLMs applied to Natural Language Processing (NLP) within educational applications. I describe research on tasks such as lexical complexity prediction and text simplification.
Identification of Predictive Subphenotypes for Clinical Outcomes Using Real-World Data and Machine Learning
Date: September 22, 2025
Time: 2 p.m.
Location: Zoom
Speaker: Weishen Pan
Affiliation: Weill Cornell Medicine
Host: Hong Yu
Abstract
Predicting treatment response is an important problem in real-world applications, where the heterogeneity of the treatment response remains a significant challenge in practice. The growing availability of real-world data (RWD), such as electronic health records (EHRs), provides opportunities to address this challenge by clustering patients based on RWD. In this talk, I will review traditional unsupervised machine learning methods for subphenotyping and highlight their limitation of not ensuring coherent outcomes within identified groups. I will then introduce our proposed Graph-Encoded Mixture Survival (GEMS) framework, a general machine learning approach designed to identify predictive subphenotypes that simultaneously ensure coherent survival outcomes and consistent baseline characteristics. I will present results from applying GEMS to a large real-world dataset of advanced non-small cell lung cancer (aNSCLC) patients, demonstrating its effectiveness in predicting overall survival (OS) and uncovering clinically interpretable subgroups. I will conclude by discussing future opportunities and challenges in extending this framework to other disease contexts.
eRevise+RF: A Writing Evaluation System for Assessing Student Essay Revisions and Providing Formative Feedback
Date: September 26, 2025
Time: 11 a.m.
Location: Zoom
Speaker: Diane Litman
Affiliation: University of Pittsburgh
Host: Hadi Amiri
Abstract
The ability to revise essays in response to feedback is important for students’ writing success. An automated writing evaluation (AWE) system that supports students in revising their essays is thus essential. In this talk, I will first present the neuro-linguistic programming (NLP) technology behind eRevise+RF, an enhanced AWE system for assessing student essay revisions and providing revision feedback. Next, I will present evaluation results from a system deployment with 406 students in Pennsylvania and Louisiana, confirming its effectiveness in improving argumentative writing skills.
Towards Self-Improving Multimodal Models
Date: October 3, 2025
Time: 11 a.m.
Location: Zoom
Speaker: Nanyun Peng
Affiliation: UCLA
Host: Hadi Amiri
Abstract
Large multimodal models (LMMs) have made impressive progress but still struggle with learning new concepts and solving complex reasoning tasks. This talk presents approaches toward self-improving multimodal models.
Making Computers Robust to Natural Language Negation
Date: October 10, 2025
Time: 11 a.m.
Location: Zoom
Speaker: Eduardo Blanco
Affiliation: University of Arizona
Host: Hadi Amiri
Abstract
Negation is a ubiquitous linguistic phenomenon. This talk presents computational models to better understand negation and improve robustness.
TrustworthyML in the Era of Frontier Models
Date: October 17, 2025
Time: 11 a.m.
Location: Zoom
Speaker: Chirag Agarwal
Affiliation: University of Virginia
Host: Hadi Amiri
Abstract
This talk explores trust, reasoning reliability, and safety challenges in large language models.
Towards Small, Open-Source, Multi-Modal Language Model Agents for Science and Society
Date: October 31, 2025
Time: 11 a.m.
Location: Zoom
Speaker: Xuan Wang
Affiliation: Virginia Tech
Host: Hadi Amiri
Abstract
This talk presents small, open-source multimodal agents for scientific and societal applications.
Train, Reason, and Act Under Value Function Guidance
Date: November 7, 2025
Time: 11 a.m.
Location: Zoom
Speaker: Kiante Brantley
Affiliation: Harvard
Host: Hadi Amiri
Abstract
This talk presents value-function-guided methods for improving LLM reasoning and efficiency.
Evaluating and Understanding LLMs: From Scientific Reasoning to Alignment as Judges
Date: November 19, 2025
Time: 11 a.m.
Location: Zoom
Speaker: Arman Cohan
Affiliation: Yale University
Host: Hadi Amiri
Abstract
This talk presents recent work on evaluating LLMs in scientific reasoning and alignment.
Safeguarding Large Language Models in Practice
Date: December 5, 2025
Time: 11 a.m.
Location: Zoom
Speaker: Nathalie Baracaldo
Affiliation: IBM Almaden Research Center
Host: Hadi Amiri
Abstract
Safeguarding large language models (LLM) is a big endeavor that goes beyond simple alignment methods.
Spring 2025
Enabling and Evaluating Human-Agent Collaboration
Date: February 7, 2025
Time: 11 a.m.
Location: Zoom
Speaker: Diyi Yang
Affiliation: Stanford
Host: Hadi Amiri
Abstract
Recent advances in large language models (LLMs) have revolutionized human-AI interaction, but their success depends on addressing key challenges like privacy and effective collaboration. In this talk, we first explore PrivacyLens, a general framework to evaluate privacy leakage in LLM agents’ actions, by extending privacy-sensitive seeds into agent trajectories. By evaluating state-of-the-art models, PrivacyLens reveals contextual and long-tail privacy vulnerabilities, even under privacy-enhancing instructions. We then introduce Co-Gym, a novel framework for studying and enhancing human-agent collaboration across various tasks. Our findings reveal that collaborative agents consistently outperform their fully autonomous counterparts in task performance. Via PrivacyLens and Co-Gym, this talk highlights how to develop AI systems that are trustworthy and capable of fostering meaningful collaboration with human users.
Why AI Is W.E.I.R.D. And Shouldn't Be This Way
Date: February 14, 2025
Time: 11 a.m.
Location: Zoom
Speaker: Rada Mihalcea
Affiliation: University of Michigan
Host: Hadi Amiri
Abstract
Recent years have witnessed remarkable advancements in AI, with language and vision models that have enabled progress in numerous applications and opened the door to the integration of AI in areas such as communication, transportation, healthcare, and arts. Yet, many of these models and their corresponding datasets are W.E.I.R.D. (Western, Educated, Industrialized, Rich, Democratic) and they are reflective of a small fraction of the population.(*) In this talk, I will show some of the limitations and lack of representation of current AI models, and highlight the need for cross-cultural language and vision models that can capture the diversity of behaviors, beliefs, and language expressions across different groups. I will also explore ways in which we can address these limitations by developing models that are re-centered around people and their unique characteristics. (*) W.E.I.R.D. is an acronym widely used in psychology to indicate the limitation of many of the studies carried out in the field.
Human-AI Collaboration in Evaluating Large Language Models
Date: February 28, 2025
Time: 11 a.m.
Location: Zoom
Speaker: Wei Xu
Affiliation: Georgia Institute of Technology
Host: Hadi Amiri
Abstract
To support real-world applications more responsibly and further improve large language models (LLMs), it is essential to design reliable and reusable frameworks for their evaluation. In this talk, I will discuss three forms of human-AI collaboration for evaluation that combine the strengths of both: (1) the reliability and user-centric aspect of human evaluation, and (2) the cost efficiency and reproducibility offered by automatic evaluation. The first part focuses on systematically assessing LLMs’ favoritism towards Western culture, using a hybrid approach of manual effort and automated analysis. The second part will showcase an LLM-powered privacy preservation tool, designed to safeguard users against the disclosure of personal information. I will share some interesting findings from an HCI user study that involves real Reddit users utilizing our tool, which in turn informs our ongoing efforts to improve the design of NLP models. Lastly, we will delve into the evaluation of LLM-generated texts, where human judgments can be used to train automatic evaluation metrics to detect errors. We also highlight the opportunity of engaging both laypeople and experts in evaluating LLM-generated simplified medical texts in high-stake healthcare applications.
privacy-preserving toolOvercoming Obstacles in NLP for Endangered Languages
Date: March 7, 2025
Time: 11 a.m.
Location: Zoom
Speaker: Emily Prud'hommeaux
Affiliation: Boston College
Host: Hadi Amiri
Abstract
A majority of the world's languages lack sufficient resources to train the state-of-the-art NLP models we've come to expect for high-resource languages like English or Mandarin. The situation is particularly dire for endangered languages, which could benefit enormously from these technologies but will never have abundant high-quality training resources. In this talk, I will discuss some approaches for addressing these challenges in automatic speech recognition and machine translation, with a focus on several different endangered and under-resourced languages.
LLMs for Healthcare: Risks and Interpretability Methods to (Possibly) Mitigate Them
Date: March 21, 2025
Time: 11 a.m.
Location: Zoom
Speaker: Byron Wallace
Affiliation: Northeastern University
Host: Hadi Amiri
Abstract
Large Language Models (LLMs) are poised to transform specialist fields like healthcare. Such models promise to free domain experts, including physicians, from drudgery, enabling better care to be delivered at scale. But the use of LLMs in healthcare—and similar high-stakes, specialized domains—brings real risks. Used naively, such models may worsen existing biases in practice. They might also result in medical errors owing to "hallucinations". In this talk I will discuss a few recent efforts designing and critically evaluating LLMs for medical language processing tasks, e.g., summarizing clinical notes in patient electronic health records (EHRs). I will highlight current limitations and associated risks of LLMs in the context of these applications, particularly related to robustness and bias. Finally, I will discuss recent work on adopting "mechanistic" interpretability methods in the space of healthcare as a potential means of mitigating these issues.
Understanding the Abilities of AI Systems: Memorization, Generalization, and Points in Between
Date: March 28, 2025
Time: 11 a.m.
Location: Zoom
Speaker: Tom McCoy
Affiliation: Yale University
Host: Hadi Amiri
Abstract
Large language models (LLMs) can perform a wide range of tasks impressively well. To what extent are these abilities driven by shallow heuristics vs. deeper abstractions? I will argue that, to answer this question, we must view LLMs through the lens of generalization. That is, we should consider the data that LLMs were trained on so that we can identify whether and how their abilities go beyond their training data. In the analyses of LLMs that I will discuss, this perspective reveals both impressive strengths and surprising limitations. For instance, LLMs often produce sentence structures that are well-formed but that never appeared in their training data, yet they also struggle on some seemingly simple algorithmic tasks (e.g., decoding simple ciphers) in ways that are well-explained by training data statistics. In sum, to understand what AI systems are, we must understand what we have trained them to be.
A Retrieval and Structuring Approach for LLM-Enhanced, Theme-Focused Science Discovery
Date: April 4, 2025
Time: 11 a.m.
Location: Zoom
Speaker: Jiawei Han
Affiliation: University of Illinois Urbana-Champaign
Host: Jie Wang
Abstract
Large Language Models (LLMs) may bring unprecedent power in scientific discovery. However, current LLMs may still encounter major challenges for effective scientific exploration due to their lack of in-depth, theme-focused data and knowledge. Retrieval augmented generation (RAG) has recently become an interesting approach for augmenting LLMs with grounded, theme-specific datasets. We discuss the challenges of RAG and propose a retrieval and structuring (RAS) approach, which enhances RAG by improving retrieval quality and mining structures (e.g., extracting entities and relations and building knowledge graphs) to ensure its effective integration of theme-specific data with LLM. We show the promise of this approach at augmenting LLMs and discuss its potential power for LLM-enabled science exploration.
Specializing LLMs for Reliability
Date: April 11, 2025
Time: 11 a.m.
Location: Zoom
Speaker: Greg Durrett
Affiliation: UT Austin
Host: Hadi Amiri
Abstract
Large language models (LLMs) have advanced the frontiers of AI reasoning: they can synthesize information from multiple sources, derive new conclusions, and explain those conclusions to their users. However, LLMs do not do this reliably. They hallucinate facts, convincingly state incorrect deductions, and exhibit logical fallacies like confirmation bias. In this talk, I will describe my lab's work on making LLM systems reliable by introspecting their behavior. First, I will argue that automating fine-grained evaluation of LLM output provides a level of understanding necessary for further progress. I will describe the ingredients of effective automated evaluators and a state-of-the-art factuality evaluation system, MiniCheck, showing that analyzing the nature of hallucinations can help reduce them. Second, I will demonstrate that better understanding of LLMs' internal reasoning processes helps us train them to be more reliable. Our work shows that model interpretation techniques can advance training methodology and dataset curation for reasoning models. Finally, I will describe how deeper understanding of LLMs will let us tackle their most fundamental limitations, such as their inconsistency when given different inputs. I will propose how these pieces might soon be combined to form reliable AI systems.
Discourse Models with Language Models
Date: April 18, 2025
Time: 11 a.m.
Location: Zoom
Speaker: Jessy Li
Affiliation: University of Texas at Austin
Host: Hadi Amiri
Abstract
How are sentences in a document connected, and why do they make the document feel “coherent”? Computational models of discourse aim to solve this myth by recovering the structural organization of texts, through which writers convey intent and meaning. In the first part of this talk, I will discuss our efforts on modeling human curiosity through question generation, and understanding its connection with discourse representations based on the linguistic theory of Questions Under Discussion. We show that LLMs, with design and training, resurface curiosity-driven questions and ground their elicitation and answers in text. Next, I will demonstrate how such generative discourse models can be used to measure discourse similarities in LLM-generated texts, as well as to derive explainable measures of information salience in LLMs using summarization as a behavioral probe.
Neuro-symbolic Approaches for Explainable Natural Language Processing
Date: April 25, 2025
Time: 11 a.m.
Location: Zoom
Speaker: Mihai Surdeanu
Affiliation: University of Arizona
Host: Hadi Amiri
Abstract
Deep learning approaches to natural language processing (NLP) such as GPT* have achieved tremendous successes recently. However, these systems are difficult to understand, augment, or maintain as needs shift. In this talk I will discuss two of our recent efforts that aim to bring explainability back into deep learning methods for NLP. In the first part of the talk, I will introduce an explainable approach for information extraction (IE), an important language understanding task that focuses on finding structured information in text such as who did what to whom when and where. Our approach mitigates the tension between generalization and explainability by jointly training for the two goals. The proposed method uses a multi-task learning architecture, which jointly trains a classifier for information extraction, and a sequence model that labels words in the context that explain the decisions of the previous classifier. We show that, even with minimal guidance for what makes a good explanation, the sequence model learns to provide accurate explanations. Further, we show that the joint training generally improves the performance of the IE classifier. In the second part of the talk, I will discuss a neuro-symbolic architecture for information extraction that preserves the advantages of both directions, i.e., the generalization power of neural methods and the pliability of symbolic approaches. Our modular approach contains two components: a declarative rule-based model and a neural component. The former implements information extraction with a set of explainable rules that rely on syntax; the latter increases the generalizability of rules by semantically matching them over text. I'll show that the proposed approach outperforms all neural models on a challenging IE task. More importantly, I'll show that that the underlying symbolic representation can be locally modified to correct model mistakes without retraining the neural component.
Fall 2024
Persuasion for Social Good: How to Build and Break Persuasive Chatbots
Date: October 25, 2024
Time: 11 a.m.
Location: Zoom
Speaker: Weiyan Shi
Affiliation: Northeastern University
Host: Hong Yu
Abstract
AI research has so far focused on modeling common human skills, such as building systems to see, read, or talk. As these systems gradually achieve a human level in standard benchmarks, it is increasingly important to develop next-generation interactive AI systems with more advanced human skills, to function in realistic and critical applications such as providing personalized emotional support. In this talk, I will cover (1) how to build such expert-like AI systems specialized in social influence that can persuade, negotiate, and cooperate with other humans during conversations. (2) I will also discuss how humans perceive such specialized AI systems. This study validates the necessity of Autobot Law and proposes guidance to regulate such systems. (3) As these systems become more powerful, AI safety problems becomes more important. I will also describe how to persuade AI models to jailbreak them and study AI safety problems. Finally, I will conclude with my long-term vision to build a natural interface between human intelligence and machine intelligence via dialogues, from a multi-angel approach that combines Artificial Intelligence, Human-Computer Interaction, and social sciences, to develop expert AI systems for everyone.
Programming Language Principles for Distributed Systems
Date: November 1, 2024
Time: 11 a.m.
Location: Zoom
Speaker: Ankush Das
Affiliation: Boston University
Host: Paul Downen
Abstract
With the proliferation of distributed systems, the design of safe, secure, and efficient software has become an ever more complex task. The heterogeneous nature of these distributed systems have further introduced domain-specific programming requirements such as inferring execution cost, accounting for randomized behavior, and preventing communication errors. To develop programming languages and reasoning tools for such multi-threaded environments, we need two main ingredients: concurrency and domain-specific support. In this talk, I will use session types as a base type system that already comes equipped with reasoning capabilities for message-passing concurrent systems. On top, I will introduce domain-specific support for 3 different domains: digital transactions, randomized systems, and program verification. Programming smart contracts comes with its unique challenges, which include enforcing protocols of interaction, tracking linear assets, and analyzing execution cost. To address these challenges, the talk introduces Nomos that employs linear session types to enforce protocols and prevent assets from being duplicated or discarded. To predict execution cost, Nomos uses resource-aware types and automatic amortized resource analysis, a type-based technique for inferring cost bounds. For randomized systems, Nomos is further enhanced with probabilistic types that track the probability distribution of message exchanges in a distributed system. Finally, to verify concurrent programs, I will introduce dependent refinement session types that can naturally track intrinsic properties such as sizes and values in the type of messages, which can then be used for lightweight verification. The talk concludes with my future plans on exploring how programming languages can aid in the specification, verification, and possibly synthesis of cryptographic protocols.
Synthetic Data for Self-Evolving AI
Date: November 8, 2024
Time: 11 a.m.
Location: Zoom
Speaker: Tianyi Zhou
Affiliation: University of Maryland, College Park
Host: Hadi Amiri
Abstract
Data is the new oil for training large AI models. However, the "oil" created by humans may run out someday or grow much slower than the speed of AI consuming them. Moreover, the human-created data are less controllable in terms of quality, opinions, format, style, etc., and may lead to biases or privacy concerns when used for model training. Can we leverage the power of Generative-AI and automatically create synthetic data in a more efficient, controllable, and safe manner, for training or benchmark purposes? How can we avoid model collapse caused by continuously training a model on self-generated synthetic data? In this talk, I will present our recent works that aim to investigate whether and how synthetic data can be created to improve large language models (LLMs) and vision-language models (VLMs), especially when the real data is non-perfect. These works include Mosaic-IT (compositional data augmentation for instruction tuning), DEBATunE (data generation by LLM debate), Diffusion Curriculum (generative curriculum learning of low-quality images), and AutoHallusion (hallucination benchmark generation via automatic image editing). These projects are led by Ming Li, Yijun Liang, Xiyang Wu, and Tianrui Guan.
Reasoning about Programs’ Adaptivity, with applications to Adaptive Data Analysis
Date: November 15, 2024
Time: 11 a.m.
Location: Zoom
Speaker: Marco Gaboardi
Affiliation: Boston University
Host: Paul Downen
Abstract
An adaptive program is a program that interacts with other components and whose choice for the next interaction depends on the results of previous interactions. Adaptive programs find applications in many areas of computer science, such as in adaptive data analysis, in the analysis of interactive protocols in security and privacy, in database systems, etc. In many of these applications it is important to quantify the level of adaptivity of a program.
Graph Representation Learning for Network Generation, Optimization, and Verbalization
Date: November 20, 2024
Time: 11 a.m.
Location: Zoom
Speaker: Liang Zhao
Affiliation: Emory University
Host: Hong Yu
Abstract
In my talk, I will focus on adaptive programs in the context of adaptive data analysis. In this area, one is interested in guaranteeing that the result of a data analysis run on sample data does not differ too much from the result one would achieve by running the same analysis over the entire population. To achieve this goal, one can use several techniques that were designed to control the generalization errors of data analyses, but in order to choose well among the different techniques one has to know the adaptivity of a program. I will show how program analysis can help with this task.
Bridging the AI Translational Gap in Oncology
Date: November 22, 2024
Time: 11 a.m.
Location: Zoom
Speaker: Danielle S. Bitterman
Affiliation: Harvard Medical School
Host: Hadi Amiri
Abstract
Concretely, I will first present a programming model for adaptive data analyses based on a simple imperative programming language that is suitable to integrate different techniques that can be used for controlling the generalization error. I will then introduce a program analysis for this language that, given an input program implementing an adaptive data analysis, generates an upper bound on the total number of queries that the data analysis will run, and more interestingly also an upper bound on the depth of the chain of queries implemented by the program. These two measures can be used to select the right technique to guarantee a bound on the generalization error of the input data analysis. I will then discuss limitations and potential future works.
Machine Unlearning for Generative AI: A Model-Based Perspective
Date: December 6, 2024
Time: 11 a.m.
Location: Zoom
Speaker: Sijia Liu
Affiliation: Michigan State University.
Host: Hadi Amiri
Abstract
In this talk, I will introduce the concept of Machine Unlearning (MU), a transformative approach to removing undesirable data influence or associated model capabilities from learned discriminative and generative models. To bridge the gap between exact and approximate unlearning, I will present a novel model-based perspective that integrates model sparsity, gradient-based weight saliency, and weight influence attribution. This model-centric approach achieves significant advancements in MU for vision and language models, balancing effectiveness, preserved utility, and enhanced efficiency. Additionally, I will explore the practical implications of MU in addressing critical challenges in AI safety.