Recent Publications (2024, 2025, 2026)
[Selected Technical Report]
VIDEO
VIDEO
[Conference Papers]
VIDEO
[CVPR 2026]
ProSoftArena: Evaluating Hierarchical Capabilities of Multimodal Agents in Professional Software Environments
Jiaxin Ai, Yukang Feng, Fanrui Zhang, Jianwen Sun, Zizhen Li, Chuanhao Li, Yifan Chang, Wenxiao Wu, Ruoxi Wang, Mingliang Zhai, Kaipeng Zhang†
[CVPR 2026 Findings]
From Static Snapshots to Dynamic Trajectories: Evaluating and Enhancing the Learning Pathways of Multimodal Large Language Models
Yukang Feng, Wenxiao Wu, Jianwen Sun, Chuanhao Li, Fanrui Zhang, Zizhen Li, Jiaxin Ai, Sizhuo Zhou, Yifan Chang, Changxin Gao, Shenglin Zhang, Kaipeng Zhang†
[ICLR 2026]
A High-Quality Dataset and Reliable Evaluation for Interleaved Image-Text Generation
Yukang Feng, Jianwen Sun, Chuanhao Li, Zizhen Li, Jiaxin Ai, Fanrui Zhang, Yifan Chang, Sizhuo Zhou, Shenglin Zhang, Yu Dai, Kaipeng Zhang†
[ICLR 2026]
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
Yang Zhou, Yifan Wang, Jianjun Zhou, Wenzheng Chang, Haoyu Guo, Zizun Li, Kaijing Ma, Xinyue Li, Yating Wang, Haoyi Zhu, Mingyu Liu, Dingning Liu, Jiange Yang, Zhoujie Fu, Junyi Chen, Chunhua Shen, Jiangmiao Pang, Kaipeng Zhang , Tong He†
[ICLR 2026]
InternSpatial: A Comprehensive Dataset for Spatial Reasoning in Vision-Language Models
Nianchen Deng, Lixin Gu, Shenglong Ye, Yinan He, Zhe Chen, Songze Li, Haomin Wang, Xingguang Wei, Tianshuo Yang, Min Dou, Tong He, Wenqi Shao, Kaipeng Zhang, Yi Wang, Botian Shi, Yanting Zhang, Jifeng Dai, Yu Qiao, Hongjie Zhang†, Wenhai Wang†
[AAAI 2026]
MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models
Pengfei Zhou, Fanrui Zhang, Xiaopeng Peng, Zhaopan Xu, Jiaxin Ai, Yansheng Qiu, Chuanhao Li, Zhen Li, Ming Li, Yukang Feng, Jianwen Sun, Haoquan Zhang, Zizhen Li, Xiaofeng Mao, Wangbo Zhao, Kai Wang, Xiaojun Chang, Wenqi Shao, Yang You, Kaipeng Zhang†
[NeurIPS 2025]
Sekai: A Video Dataset towards World Exploration
Zhen Li, Chuanhao Li, Xiaofeng Mao, Shaoheng Lin, Ming Li, Shitian Zhao, xu Zhao Pan, Xinyue Li, Yukang Feng, Jianwen Sun, Zizhen Li, Fanrui Zhang, Jiaxin Ai, Zhixiang Wang, Yuwei Wu†, Tong He, Yunde Jia, Kaipeng Zhang†
VIDEO
[NeurIPS 2025]
Neural-Driven Image Editing
Pengfei Zhou, Jie Xia, Xiaopeng Peng, Wangbo Zhao, Zilong Ye, Zekai Li, Suorong Yang, Jiadong Pan, Yuanxiang Chen, Ziqiao Wang, Kai Wang, Qian Zheng, Xiaojun Chang, Gang Pan, Shurong Dong†, Kaipeng Zhang† , Yang You
[NeurIPS 2025]
REPA Works Until It Doesn't: Early-Stopped, Holistic Alignment Supercharges Diffusion Training
Ziqiao Wang, Wangbo Zhao, Yuhao Zhou, Zekai Li, Zhiyuan Liang, Mingjia Shi, Xuanlei Zhao, Pengfei Zhou, Kaipeng Zhang† , Zhangyang Wang, Kai Wang†, Yang You
[EMNLP 2025]
InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles
Zizhen Li, Chuanhao Li, Yibin Wang, Qi Chen, Diping Song, Yukang Feng, Jianwen Sun, Jiaxin Ai, Fanrui Zhang, Mingzhu Sun, Kaipeng Zhang†
[ICCV 2025]
ProJudge: A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges
Jiaxin Ai, Pengfei Zhou, xu Zhao Pan, Ming Li, Fanrui Zhang, Zizhen Li, Jianwen Sun, Yukang Feng, Baojin Huang, Zhongyuan Wang†, Kaipeng Zhang†
[ICCV 2025]
LiT: Delving into a Simple Linear Diffusion Transformer for Image Generation
Jiahao Wang, Ning Kang, Lewei Yao, Mengzhao Chen, Chengyue Wu, Songyang Zhang, Shuchen Xue, Yong Liu, Taiqiang Wu, Xihui Liu, Kaipeng Zhang , Shifeng Zhang, Wenqi Shao†, Zhenguo Li†, Ping Luo
[CVPR 2025 Oral]
OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation
Pengfei Zhou*, Xiaopeng Peng*, Jiajun Song, Chuanhao Li, Zhaopan Xu, Yue Yang, Ziyao Guo, Hao Zhang, Yuqi Lin, Yefei He, Lirui Zhao, Shuo Liu, Tianhua Li, Yuxuan Xie, Xiaojun Chang, Yu Qiao, Wenqi Shao, Ping Luo, Kaipeng Zhang†
[ICLR 2025]
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Fanqing Meng*, Chuanhao Li*, Jin Wang*, Quanfeng Lu, Hao Tian, Tianshuo Yang, Jiaqi Liao, Xizhou Zhu, Jifeng Dai, Yu Qiao, Ping Luo, Kaipeng Zhang† , Wenqi Shao†
[NeurIPS 2024 Spotlight]
ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Capability for Large Vision-Language Models
Shuo Liu, Kaining Ying, Hao Zhang, Yue Yang, Yuqi Lin, Tianle Zhang, Chuanhao Li, Yu Qiao, Ping Luo, Wenqi Shao†, Kaipeng Zhang†
[NeurIPS 2024]
Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability, Reproducibility, and Practicality
Tianle Zhang, Langtian Ma, Yuchen Yan, Yuchen Zhang, Kai Wang, Yue Yang, Ziyao Guo, Wenqi Shao, Yang You, Yu Qiao, Ping Luo, Kaipeng Zhang†
[NeurIPS 2024]
Needle In A Multimodal Haystack
Weiyun Wang, Shuibo Zhang, Yiming Ren, Yuchen Duan, Tiantong Li, Shuo Liu, Mengkang Hu, Zhe Chen, Kaipeng Zhang , Lewei Lu, Xizhou Zhu, Ping Luo, Yu Qiao, Jifeng Dai, Wenqi Shao†, Wenhai Wang†
[NeurIPS 2024]
Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT
Le Zhuo*, Ruoyi Du*, Han Xiao*, Yangguang Li*, Dongyang Liu*, Rongjie Huang*, Wenze Liu*, Lirui Zhao, Fu-Yun Wang, Zhanyu Ma, Xu Luo, Zehan Wang, Kaipeng Zhang , Xiangyang Zhu, Si Liu, Xiangyu Yue, Dingning Liu, Wanli Ouyang, Ziwei Liu, Yu Qiao†, Hongsheng Li†, Peng Gao†
[ICML 2024]
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
Kaining Ying*, Fanqing Meng*, Jin Wang*, Zhiqian Li, Han Lin, Yue Yang, Hao Zhang, Wenbo Zhang, Yuqi Lin, Shuo Liu, Jiayi Lei, Quanfeng Lu, Cunjian Chen, Peng Xu, Renrui Zhang, Haozhe Zhang, Peng Gao, Yali Wang, Yu Qiao, Ping Luo, Kaipeng Zhang† and Wenqi Shao†
[ICML 2024]
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Peng Gao*†, Renrui Zhang*, Chris Liu*, Longtian Qiu*, Siyuan Huang*, Weifeng Lin*, Shitian Zhao, Shijie Geng, Ziyi Lin, Peng Jin, Kaipeng Zhang , Wenqi Shao, Chao Xu, Conghui He, Junjun He, Hao Shao, Pan Lu, Hongsheng Li† and Yu Qiao
[Journal Papers]
[TBigData 2024]
Tiny LVLM-eHub: Early Multimodal Experiments with Bard
Wenqi Shao*, Yutao Hu*, Peng Gao*, Meng Lei*, Kaipeng Zhang , Fanqing Meng, Peng Xu, Siyuan Huang, Hongsheng Li, Yu Qiao†, Ping Luo†
[Tutorial]