YUANHAN (John) ZHANG

元瀚

Final-year PhD at MMLab@NTU, advised by Prof. Ziwei Liu.

I build and evaluate multimodal large language models, with a focus on post-training and benchmarking.

New Video-TT ICCV'25 · LLaVA-Video & OneVision TMLR'25

Current Interests

Information-dense Video Reasoning: Building models that identify salient details, connect dispersed evidence, and infer causal structure in complex visual narratives.

Harness-centric LLM Systems: Designing harnesses that structure memory for effective in-context comprehension and encode success criteria as executable checks that test for correctness, not mere plausibility.

Benchmarking Model–Human Gaps: Developing diagnostic benchmarks that reveal systematic differences between models and humans, and suggest better optimization targets.

Featured Work

Full publication list →