Selected Recent Publications (in 2024)
[Preprint]
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Fanqing Meng, Jin Wang, Chuanhao Li, Quanfeng Lu, Hao Tian, Jiaqi Liao, Xizhou Zhu, Jifeng Dai, Yu Qiao, Ping Luo, Kaipeng Zhang†, Wenqi Shao†
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
Quanfeng Lu, Wenqi Shao†, Zitao Liu, Fanqing Meng, Boxuan Li, Botong Chen, Siyuan Huang, Kaipeng Zhang, Yu Qiao, Ping Luo†
AVIBench: Towards Evaluating the Robustness of Large Vision-Language Model on Adversarial Visual-Instructions
Hao Zhang, Wenqi Shao, Hong Liu, Yongqiang Ma, Ping Luo, Yu Qiao, Kaipeng Zhang†
MLLMs-Augmented Visual-Language Representation Learning
Yanqing Liu, Kai Wang, Wenqi Shao, Ping Luo, Yu Qiao, Mike Zheng Shou, Kaipeng Zhang†, Yang You†
ImageBind-LLM: Multi-modality Instruction Tuning
Jiaming Han*, Renrui Zhang*, Wenqi Shao*, Peng Gao*, Peng Xu*, Han Xiao*, Kaipeng Zhang, Chris Liu, Song Wen, Ziyu Guo, Xudong Lu, Shuai Ren, Yafei Wen, Xiaoxin Chen, Xiangyu Yue†, Hongsheng Li†, Yu Qiao†
LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models
Peng Xu*, Wenqi Shao†*, Kaipeng Zhang*, Peng Gao*, Shuo Liu, Meng Lei, Fanqing Meng, Siyuan Huang, Yu Qiao, Ping Luo†
Tiny LVLM-eHub: Early Multimodal Experiments with Bard
Wenqi Shao*, Yutao Hu*, Peng Gao*, Meng Lei*, Kaipeng Zhang, Fanqing Meng, Peng Xu, Siyuan Huang, Hongsheng Li, Yu Qiao†, Ping Luo†
Adapting LLaMA Decoder to Vision Transformer
Jiahao Wang, Wenqi Shao†, Mengzhao Chen, Chengyue Wu, Yong Liu, Kaipeng Zhang, Songyang Zhang, Kai Chen, Ping Luo†
RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation
Junting Chen*, Yao Mu*, Qiaojun Yu, Tianming Wei, Silang Wu, Zhecheng Yuan, Zhixuan Liang, Chao Yang, Kaipeng Zhang, Wenqi Shao, Yu Qiao, Huazhe Xu, Mingyu Ding†, Ping Luo†
Meta-Transformer: A Unified Framework for Multimodal Learning
Yiyuan Zhang*, Kaixiong Gong*, Kaipeng Zhang†, Hongsheng Li, Yu Qiao, Wanli Ouyang, Xiangyu Yue†
[Conference/Journal]
SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up-to-Date Internet Knowledge
Chuanhao Li, Zhen Li, Chenchen Jing, Shuo Liu, Wenqi Shao, Yuwei Wu, Ping Luo, Yu Qiao, Kaipeng Zhang†
Multimodal LLM[NeurIPS 2024]Paper
ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Capability for Large Vision-Language Models
Shuo Liu, Kaining Ying, Hao Zhang, Yue Yang, Yuqi Lin, Tianle Zhang, Chuanhao Li, Yu Qiao, Ping Luo, Wenqi Shao†, Kaipeng Zhang†
Multimodal LLM[NeurIPS 2024]Paper
Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality
Tianle Zhang, Langtian Ma, Yuchen Yan, Yuchen Zhang, Kai Wang, Yue Yang, Ziyao Guo, Wenqi Shao, Yang You, Yu Qiao, Ping Luo, Kaipeng Zhang†
Text-to-Video[NeurIPS 2024]Paper
Needle In A Multimodal Haystack
Weiyun Wang, Shuibo Zhang, Yiming Ren, Yuchen Duan, Tiantong Li, Shuo Liu, Mengkang Hu, Zhe Chen, Kaipeng Zhang, Lewei Lu, Xizhou Zhu, Ping Luo, Yu Qiao, Jifeng Dai, Wenqi Shao†, Wenhai Wang†
Multimodal LLM[NeurIPS 2024]Paper
Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT
Le Zhuo*, Ruoyi Du*, Han Xiao*, Yangguang Li*, Dongyang Liu*, Rongjie Huang*, Wenze Liu*, Lirui Zhao, Fu-Yun Wang, Zhanyu Ma, Xu Luo, Zehan Wang, Kaipeng Zhang, Xiangyang Zhu, Si Liu, Xiangyu Yue, Dingning Liu, Wanli Ouyang, Ziwei Liu, Yu Qiao†, Hongsheng Li†, Peng Gao†
Multimodal LLM[NeurIPS 2024]Paper
Open-Vocabulary Animal Keypoint Detection with Semantic-feature Matching
Hao Zhang, Lumin Xu, Shenqi Lai, Wenqi Shao, Nanning Zheng†, Ping Luo, Yu Qiao, Kaipeng Zhang†
Multimodality[IJCV 2024] Paper
Towards Implicit Prompt For Text-To-Image Models
Yue Yang, Yuqi Lin, Hong Liu, Wenqi Shao, Runjian Chen, Hailong Shang, Yu Wang, Yu Qiao, Kaipeng Zhang† and Ping Luo†
Text-to-Image [ICML 2024] Paper
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
Kaining Ying*, Fanqing Meng*, Jin Wang*, Zhiqian Li, Han Lin, Yue Yang, Hao Zhang, Wenbo Zhang, Yuqi Lin, Shuo Liu, Jiayi Lei, Quanfeng Lu, Cunjian Chen, Peng Xu, Renrui Zhang, Haozhe Zhang, Peng Gao, Yali Wang, Yu Qiao, Ping Luo, Kaipeng Zhang† and Wenqi Shao†
Multimodal LLM [ICML 2024] Paper
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Peng Gao*†, Renrui Zhang*, Chris Liu*, Longtian Qiu*, Siyuan Huang*, Weifeng Lin*, Shitian Zhao, Shijie Geng, Ziyi Lin, Peng Jin, Kaipeng Zhang, Wenqi Shao, Chao Xu, Conghui He, Junjun He, Hao Shao, Pan Lu, Hongsheng Li† and Yu Qiao
Multimodal LLM [ICML 2024] Paper
DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model
Lirui Zhao*, Yue Yang*, Kaipeng Zhang‡*, Wenqi Shao‡*, Yuxin Zhang, Yu Qiao, Ping Luo, Rongrong Ji†
Text-to-Image [CVPR 2024] Paper
OneLLM: One Framework to Align All Modalities with Language
Jiaming Han, Kaixiong Gong, Yiyuan Zhang, Jiaqi Wang, Kaipeng Zhang, Dahua Lin, Yu Qiao, Peng Gao, Xiangyu Yue†
Multimodal LLM [CVPR 2024] Paper
T3M: Text Guided 3D Human Motion Synthesis from Speech
Wenshuo Peng, Kaipeng Zhang†, Sai Qian Zhang†
Multimodality [NAACL Findings 2024] Paper
ChartAssistant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning
Fanqing Meng, Wenqi Shao†, Quanfeng Lu, Peng Gao, Kaipeng Zhang, Yu Qiao, Ping Luo†
Multimodal LLM [ACL Findings 2024] Paper
Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching
Ziyao Guo, Kai Wang, George Cazenavette, Hui Li, Kaipeng Zhang†, Yang You†
Dataset Distillation [ICLR 2024] Paper
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models
Wenqi Shao*, Mengzhao Chen*, Zhaoyang Zhang, Peng Xu, Lirui Zhao, Zhiqian Li, Kaipeng Zhang, Peng Gao, Yu Qiao, Ping Luo†
Efficient LLM [ICLR 2024] Paper
BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation
Peng Xu, Wenqi Shao†, Mengzhao Chen, Shitao Tang, Kaipeng Zhang, Peng Gao, Fengwei An, Yu Qiao, Ping Luo†
Efficient LLM [ICLR 2024] Paper
Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification
Wenshuo Peng, Kaipeng Zhang†, Yue Yang, Hao Zhang, Yu Qiao
Multimodality [AAAI 2024] Paper
TagCLIP: A Local-to-Global Framework to Enhance Open-Vocabulary Multi-Label Classification of CLIP Without Training
Yuqi Lin, Minghao Chen†, Kaipeng Zhang†, Hengjia Li, Mingming Li, Zheng Yang, Dongqin Lv, Binbin Lin, Haifeng Liu, Deng Cai
Multimodality [AAAI 2024] Paper
Education
Ph.d. in CS, The University of Tokyo, Tokyo, Japan
Apr. 2019 - Mar. 2022
M.S. in CS, National Taiwan University, Taipei, Taiwan
Sep. 2016 - Aug. 2018
B.Eng. in CS, Donghua University, Shanghai, China
Sep. 2012 - July 2016
Selected Awards and Competitions
WAIC Young Outstanding Paper Award, 2022
World's TOP 2% Scientists (published by Stanford University), 2020 & 2021 & 2022 & 2023
JSPS Research Fellowships for Young Scientists, 2020
Tencent Rhino-Bird Elite Training Program, 2020
MSRA Fellowship Nomination Award, 2019
Emotion Recognition in the Wild: Engagement Prediction (ICMI 2019 Grand Challenge), 3rd place
Emotion Recognition in the Wild: Group-based Cohesion Prediction (ICMI 2019 Grand Challenge), 2nd place
Disguised Faces in the Wild Challenge (in conjunction with CVPR 2018), 1st place
Emotion Recognition in the Wild: Group-level emotion recognition (ICMI 2018 Grand Challenge), 2nd place
Emotion Recognition in the Wild: Group-level emotion recognition (ICMI 2017 Grand Challenge), 1st place
ChaLearn Looking at People Challenge: Accessories Classification (in conjunction with CVPR 2016), 1st place
ChaLearn Looking at People Challenge: Smile and Gender Classification (in conjunction with CVPR 2016), 1st place
Outstanding Undergraduate Thesis, 2016
Academic Service
Senior program committee of IJCAI
Reviewer/Program committee of NeurIPS, ICML, ICLR, AAAI, ICCV, ECCV, CVPR, BMVC, WACV and ACCV
Reviewer of TPAMI, TIP, TCSVT, TNNLS, TMM, TIFS, Neurocomputing, Pattern Recognition, and SPL
Work Experience
Researcher
Shanghai AI Lab
OpenGVLab
Shanghai, China
May. 2022 - Present
Researcher
SenseTime
Research Institute
Shenzhen, China
Sept. 2018 - Mar. 2019
Intern
MSRA
Visual Computing Group
Beijing, China
Jan. 2018 - Jul. 2018
Consultant
ULSee
Face Team
Hangzhou, China
Oct. 2016 - Mar. 2018
Intern
Tencen
AI Lab & AI Advertisement Department
Shenzhen, China
Jul. 2017 - Aug. 2017
Sep. 2020 - Feb. 2021
Visiting Student
Shenzhen Institutes of Advanced Technology
Multimedia Research Center
Shenzhen, China
Jul. 2015 - Aug. 2016