Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding
Yuanhan Zhang*, Yunice Chew*, Yuhao Dong, Aria Leo, Bo Hu, Ziwei Liu
A video reasoning benchmark that eliminates key-frame sampling bias — exposes a 48-point gap between GPT-4o (36.6%) and humans (84.3%).