Course Information

AIDA.4201 Vision Language Models

Id: 042975 Credits: 3-3

Description

This course studies vision language models (VLM) that jointly reason over visual and textual information. Topics include vision and language representation learning, cross-modal alignment, contrastive objectives, multimodal transformers, large-scale pretraining, and instruction-tuned vision-language systems. Students will analyze and implement modern VLM architectures such as CLIP-style models, multimodal LLMs, and retrieval-augmented VLMs. The course emphasizes theoretical principles, system design tradeoffs, evaluation, and ethical considerations, culminating in a semester-long project.

Prerequisites

AIDA.3221 Deep Learning.

View Current Offerings

Course prerequisites/corequisites are determined by the faculty and approved by the curriculum committees. Students are required to fulfill these requirements prior to enrollment. For courses offered through online or GPS delivery, students are responsible for confirming with the instructor or department that all enrollment requirements have been satisfied before registering.

Catalog : AIDA.4201 Vision Language Models

AIDA.4201 Vision Language Models

Description

Prerequisites

Quick Links