11/18/2025
By Kshitij Kumar Srivastava
The Kennedy College of Sciences, Department of Computer Science, invites you to attend a Master of Science in Computer Science thesis defense by Kshitij Kumar Srivastava on "S3: Stable Subgoal Selection by Constraining Uncertainty of Coarse Dynamics in Hierarchical Reinforcement Learning."
Candidate Name: Kshitij Kumar Srivastava
Defense Date: Friday, November 21, 2025
Time: 10 a.m. to noon
Thesis Title: S3: Stable Subgoal Selection by Constraining Uncertainty of Coarse Dynamics in Hierarchical Reinforcement Learning
Location: ETIC 445 and via Zoom
Committee Members:
- Kshitij Jerath (Advisor), Francis College of Engineering, UMass Lowell
- Maru Cabrera (Member), Miner School of Computer and Information Sciences, UMass Lowell
- Hadi Amiri (Member), Miner School of Computer and Information Sciences, UMass Lowell
- Reza Azadeh (Member), Miner School of Computer and Information Sciences, UMass Lowell
Abstract:
Hierarchical Reinforcement Learning (HRL) intends to separate strategic planning from primitive execution. It has been widely successful in solving long-horizon and complex tasks, where flat-RL algorithms have difficulty in learning. However, while the low-level agent in HRL benefits from dense feedback and abundant trial opportunities, the high-level agent receives sparse, delayed feedback from the environment and its performance depends on the low-level execution capability. In this paper, we study whether subgoal selection by the high-level agent can be performed more strategically, by providing it with dynamics-aware intrinsic motivation. Since motivation based on primitive transition dynamics would require broad coverage of the state-action space, we propose to use coarse dynamics, i.e., environment transitions aggregated over multiple steps at the temporal scale at which the high-level agent operates. This approach stabilizes the high-level policy by learning to minimize the predictive uncertainty associated with the coarse dynamics, and provides a guided structure for navigation. We model the predictive uncertainty by evaluating different dispersion metrics as approximated by a Mixture Density Network (MDN). Empirically, we observe that a dense, dynamics-aware intrinsic reward leads to risk-averse subgoal selection, enabling it to outperform state-of-the-art HRL methods in non-stationary long-horizon environments.