GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation
Pengfei Zhou, Xiaopeng Peng, Jiajun Song, Chuanhao Li, Zhaopan Xu, Yue Yang, Ziyao Guo, Hao Zhang, Yuqi Lin, Yefei He, Lirui Zhao, Shuo Liu, Tianhua Li, Yuxuan Xie, Xiaojun Chang, Yu Qiao, Wenqi Shao, Kaipeng Zhang†
RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation
Junting Chen*, Yao Mu*, Qiaojun Yu, Tianming Wei, Silang Wu, Zhecheng Yuan, Zhixuan Liang, Chao Yang, Kaipeng Zhang, Wenqi Shao, Yu Qiao, Huazhe Xu, Mingyu Ding†, Ping Luo†
[Conference]
SAMRefiner: Taming Segment Anything Model for Universal Mask Refinement
Yuqi Lin, Hengjia Li, Wenqi Shao, Zheng Yang, Jun Zhao, Xiaofei He†, Ping Luo, Kaipeng Zhang†
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Fanqing Meng*, Chuanhao Li*, Jin Wang*, Quanfeng Lu, Hao Tian, Tianshuo Yang, Jiaqi Liao, Xizhou Zhu, Jifeng Dai, Yu Qiao, Ping Luo, Kaipeng Zhang†, Wenqi Shao†
Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping
Yue Yang*, Shuibo Zhang*, Wenqi Shao†, Kaipeng Zhang†, Yi Bin, Yu Wang, Ping Luo†
SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up-to-Date Internet Knowledge
Chuanhao Li, Zhen Li, Chenchen Jing, Shuo Liu, Wenqi Shao, Yuwei Wu, Ping Luo, Yu Qiao, Kaipeng Zhang†
ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Capability for Large Vision-Language Models
Shuo Liu, Kaining Ying, Hao Zhang, Yue Yang, Yuqi Lin, Tianle Zhang, Chuanhao Li, Yu Qiao, Ping Luo, Wenqi Shao†, Kaipeng Zhang†
Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality
Tianle Zhang, Langtian Ma, Yuchen Yan, Yuchen Zhang, Kai Wang, Yue Yang, Ziyao Guo, Wenqi Shao, Yang You, Yu Qiao, Ping Luo, Kaipeng Zhang†
Needle In A Multimodal Haystack
Weiyun Wang, Shuibo Zhang, Yiming Ren, Yuchen Duan, Tiantong Li, Shuo Liu, Mengkang Hu, Zhe Chen, Kaipeng Zhang, Lewei Lu, Xizhou Zhu, Ping Luo, Yu Qiao, Jifeng Dai, Wenqi Shao†, Wenhai Wang†
Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT
Le Zhuo*, Ruoyi Du*, Han Xiao*, Yangguang Li*, Dongyang Liu*, Rongjie Huang*, Wenze Liu*, Lirui Zhao, Fu-Yun Wang, Zhanyu Ma, Xu Luo, Zehan Wang, Kaipeng Zhang, Xiangyang Zhu, Si Liu, Xiangyu Yue, Dingning Liu, Wanli Ouyang, Ziwei Liu, Yu Qiao†, Hongsheng Li†, Peng Gao†
Towards Implicit Prompt For Text-To-Image Models
Yue Yang, Yuqi Lin, Hong Liu, Wenqi Shao, Runjian Chen, Hailong Shang, Yu Wang, Yu Qiao, Kaipeng Zhang† and Ping Luo†
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
Kaining Ying*, Fanqing Meng*, Jin Wang*, Zhiqian Li, Han Lin, Yue Yang, Hao Zhang, Wenbo Zhang, Yuqi Lin, Shuo Liu, Jiayi Lei, Quanfeng Lu, Cunjian Chen, Peng Xu, Renrui Zhang, Haozhe Zhang, Peng Gao, Yali Wang, Yu Qiao, Ping Luo, Kaipeng Zhang† and Wenqi Shao†
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Peng Gao*†, Renrui Zhang*, Chris Liu*, Longtian Qiu*, Siyuan Huang*, Weifeng Lin*, Shitian Zhao, Shijie Geng, Ziyi Lin, Peng Jin, Kaipeng Zhang, Wenqi Shao, Chao Xu, Conghui He, Junjun He, Hao Shao, Pan Lu, Hongsheng Li† and Yu Qiao
DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model
Lirui Zhao*, Yue Yang*, Kaipeng Zhang‡*, Wenqi Shao‡*, Yuxin Zhang, Yu Qiao, Ping Luo, Rongrong Ji†
OneLLM: One Framework to Align All Modalities with Language
Jiaming Han, Kaixiong Gong, Yiyuan Zhang, Jiaqi Wang, Kaipeng Zhang, Dahua Lin, Yu Qiao, Peng Gao, Xiangyu Yue†
T3M: Text Guided 3D Human Motion Synthesis from Speech
Wenshuo Peng, Kaipeng Zhang†, Sai Qian Zhang†
[NAACL Findings 2024] Paper
ChartAssistant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning
Fanqing Meng, Wenqi Shao†, Quanfeng Lu, Peng Gao, Kaipeng Zhang, Yu Qiao, Ping Luo†
[ACL Findings 2024] Paper
Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching
Ziyao Guo, Kai Wang, George Cazenavette, Hui Li, Kaipeng Zhang†, Yang You†
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models
Wenqi Shao*, Mengzhao Chen*, Zhaoyang Zhang, Peng Xu, Lirui Zhao, Zhiqian Li, Kaipeng Zhang, Peng Gao, Yu Qiao, Ping Luo†
BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation
Peng Xu, Wenqi Shao†, Mengzhao Chen, Shitao Tang, Kaipeng Zhang, Peng Gao, Fengwei An, Yu Qiao, Ping Luo†
Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification
Wenshuo Peng, Kaipeng Zhang†, Yue Yang, Hao Zhang, Yu Qiao
TagCLIP: A Local-to-Global Framework to Enhance Open-Vocabulary Multi-Label Classification of CLIP Without Training
Yuqi Lin, Minghao Chen†, Kaipeng Zhang†, Hengjia Li, Mingming Li, Zheng Yang, Dongqin Lv, Binbin Lin, Haifeng Liu, Deng Cai
Align, Adapt and Inject: Audio-Guided Image Generation, Editing and Stylization
Yue Yang, Kaipeng Zhang†, Yuying Ge, Wenqi Shao, Zeyue Xue, Yu Qiao, Ping Luo†
[Journal]
Open-Vocabulary Animal Keypoint Detection with Semantic-feature Matching
Hao Zhang, Lumin Xu, Shenqi Lai, Wenqi Shao, Nanning Zheng†, Ping Luo, Yu Qiao, Kaipeng Zhang†
LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models
Peng Xu*, Wenqi Shao†*, Kaipeng Zhang*, Peng Gao*, Shuo Liu, Meng Lei, Fanqing Meng, Siyuan Huang, Yu Qiao, Ping Luo†
B-AVIBench: Towards Evaluating the Robustness of Large Vision-Language Model on Black-box Adversarial Visual-Instructions
Hao Zhang, Wenqi Shao, Hong Liu, Yongqiang Ma, Ping Luo, Yu Qiao, Nanning Zheng†, Kaipeng Zhang†
Tiny LVLM-eHub: Early Multimodal Experiments with Bard
Wenqi Shao*, Yutao Hu*, Peng Gao*, Meng Lei*, Kaipeng Zhang, Fanqing Meng, Peng Xu, Siyuan Huang, Hongsheng Li, Yu Qiao†, Ping Luo†
HF-HRNet: a simple hardware friendly high-resolution network
Hao Zhang, Yujie Dun, Yixuan Pei, Shenqi Lai, Chengxu Liu, Kaipeng Zhang, Xueming Qian†
HRVMamba: High-Resolution Visual State Space Model for Dense Prediction
Hao Zhang, Yongqiang Ma, Kaipeng Zhang, Nanning Zheng†, Shenqi Lai†
[Pattern Recognition 2024]Paper
Education
Ph.d. in CS, The University of Tokyo, Tokyo, Japan
Apr. 2019 - Mar. 2022
M.S. in CS, National Taiwan University, Taipei, Taiwan
Sep. 2016 - Aug. 2018
B.Eng. in CS, Donghua University, Shanghai, China
Sep. 2012 - July 2016
Selected Awards and Competitions
WAIC Young Outstanding Paper Award, 2022
World's TOP 2% Scientists (published by Stanford University), 2020 & 2021 & 2022 & 2023
JSPS Research Fellowships for Young Scientists, 2020
Tencent Rhino-Bird Elite Training Program, 2020
MSRA Fellowship Nomination Award, 2019
Emotion Recognition in the Wild: Engagement Prediction (ICMI 2019 Grand Challenge), 3rd place
Emotion Recognition in the Wild: Group-based Cohesion Prediction (ICMI 2019 Grand Challenge), 2nd place
Disguised Faces in the Wild Challenge (in conjunction with CVPR 2018), 1st place
Emotion Recognition in the Wild: Group-level emotion recognition (ICMI 2018 Grand Challenge), 2nd place
Emotion Recognition in the Wild: Group-level emotion recognition (ICMI 2017 Grand Challenge), 1st place
ChaLearn Looking at People Challenge: Accessories Classification (in conjunction with CVPR 2016), 1st place
ChaLearn Looking at People Challenge: Smile and Gender Classification (in conjunction with CVPR 2016), 1st place
Outstanding Undergraduate Thesis, 2016
Academic Service
Senior program committee of IJCAI
Reviewer/Program committee of NeurIPS, ICML, ICLR, AAAI, ICCV, ECCV, CVPR, BMVC, WACV and ACCV
Reviewer of TPAMI, TIP, TCSVT, TNNLS, TMM, TIFS, Neurocomputing, Pattern Recognition, and SPL
Work Experience
Researcher
Shanghai AI Lab
OpenGVLab
Shanghai, China
May. 2022 - Present
Researcher
SenseTime
Research Institute
Shenzhen, China
Sept. 2018 - Mar. 2019
Intern
MSRA
Visual Computing Group
Beijing, China
Jan. 2018 - Jul. 2018
Consultant
ULSee
Face Team
Hangzhou, China
Oct. 2016 - Mar. 2018
Intern
Tencen
AI Lab & AI Advertisement Department
Shenzhen, China
Jul. 2017 - Aug. 2017
Sep. 2020 - Feb. 2021
Visiting Student
Shenzhen Institutes of Advanced Technology
Multimedia Research Center
Shenzhen, China
Jul. 2015 - Aug. 2016