Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding
Yuanhan Zhang*, Yunice Chew*, Yuhao Dong, Aria Leo, Bo Hu, Ziwei Liu
GPT-4o: 36.6%. Humans: 84.3%. Benchmarks video reasoning without key-frame sampling bias.