Bowen Cheng 程博文

Bowen is a researcher at Meta Superintelligence Lab (MSL).

Bowen was previously a Researcher at OpenAI, working on multimodal understanding and interaction. Before that, Bowen was a Senior Research Scientist at Tesla. Bowen received his Ph.D. in Electrical and Computer Engineering (ECE) at the University of Illinois Urbana-Champaign (UIUC). His Ph.D. advisors are Prof. Alexander Schwing and Prof. Thomas Huang (2017-2020).

Bowen has interned at FAIR NYC (Facebook AI Research, New York City), FAIR MPK (Facebook AI Research, Menlo Park), Google Research (Los Angeles), Microsoft Research (Redmond), and Microsoft Research Asia (Beijing, China).

Research Interests

I believe multimodal foundation models can change how humans interact with AI systems. I am interested in building a real-time multimodal interaction system, with: (1) real-time, streaming audio/video input and real-time streaming audio/video output; with infinite context and smooth interaction (2) advanced long-term memory (3) staying up-to-date, proactive content creation

Selected projects

Thinking with Images - initiate research and foundational contributor
OpenAI o3 and o4-mini - core contributor
GPT-4.1 - core contributor
OpenAI Audio API - research
GPT-4o - core contributor (perception and advanced voice mode)
Tesla FSDv12 - core contributor
Mask2Former
MaskFormer
Panoptic-DeepLab

Please refer to my Google scholar for a full list of my publications.