Back to Upcoming Events
FDS CS Colloquium

Video Understanding and Generation with Multimodal Foundation Models

Speaker: Ming-Hsuan Yang

Professor
Research Scientist at Google DeepMind

University of California Merced

Thursday, December 5, 2024

4:00PM - 5:00PM

Location: Davies Auditorium, 15 Prospect St, New Haven, CT 06511

Add To: Google Calendar | Outlook | iCal File

Abstract: Recent advancements in vision and language models have greatly enhanced various visual tasks related to understanding and generation. In this talk, I will present our latest research on effective tokenizers for transformers and discuss our efforts to adapt frozen large language models for a range of vision tasks, including visual classification, video-text retrieval, visual captioning, vision query answering, visual grounding, video generation, stylization, outpainting, and video-to-audio conversion. If time permits, I will also share some recent findings in 3D vision.

Bio: Ming-Hsuan Yang is a Professor at UC Merced and a Research Scientist at Google DeepMind. He received the Google Faculty Award in 2009, the NSF CAREER Award in 2012, and the Nvidia Pioneer Research Award in 2017 and 2018. Yang has earned several awards, including Best Paper Honorable Mention at UIST 2017, Best Paper Honorable Mention at CVPR 2018, Best Student Paper Honorable Mention at ACCV 2018, Longuet-Higgins Prize (for test of time) at CVPR 2023, and Best Paper at ICML 2024. He serves as Associate Editor-in-Chief of PAMI and as an Associate Editor for IJCV. Previously, he was the Editor-in-Chief of CVIU and served as program co-chair for ICCV in 2019. Yang is a Fellow of both the IEEE and ACM.

Host: Alex Wong

Submit an Event

Interested in creating your own event, or have an event to share? Please fill the form if you’d like to send us an event you’d like to have added to the calendar.

Submit an Event

Share your event ideas with us using the form below.

"*" indicates required fields

MM slash DD slash YYYY
Start Time*
:
End Time*
: