AI Researcher

Hi, I'm
Jie Qin

Senior Algorithm Researcher

Meituan | Beidou Talent Program

I am currently a Senior Algorithm Researcher at Meituan (Beidou Talent Program), focusing on unified multimodal large models. I received my Ph.D. in Pattern Recognition and Intelligent Systems from the Institute of Automation, Chinese Academy of Sciences (CASIA) in 2024. My research interests encompass multimodal perception, vision foundation models, and multi-agent collaborative generation systems.

Research Interests

Unified Multimodal Large Models (MLLMs) Vision Foundation Models (VFMs) Multimodal Perception Generative Models & Agents

jayqinliu@gmail.com

Google Scholar

Latest News

2026.02
One paper is accepted by CVPR 2026.
2026.01
One paper is accepted by ICLR 2026.
2025.09
One paper is accepted by NeurIPS 2025.
2024.07
One paper is accepted by ECCV 2024.
2024.06
Successfully obtained my Ph.D. degree from the Institute of Automation, CAS!
2024.02
One paper is accepted by AAAI 2024.
2023.09
One paper is accepted by ICCV 2023.
2023.03
One paper is accepted by CVPR 2023.
2022.09
One paper is accepted by ECCV 2022.
2022.02
One paper is accepted by AAAI 2022.

Experience

2024.06 - Present

Senior Algorithm Researcher

Meituan | Beidou Talent Program | M17-MM Dept

2021.06 - 2023.12

Algorithm Intern

ByteDance | Intelligent Creation AutoML Platform

2019.12 - 2020.12

Algorithm Intern

Horizon Robotics | Fundamental Vision Algorithm Dept.

Education

2019.09 - 2024.06

Institute of Automation, Chinese Academy of Sciences

Ph.D. (Direct) | Pattern Recognition & Intelligent Systems

2015.09 - 2019.06

Nanjing University of Aeronautics and Astronautics

B.E. | College of Automation

Ranked Top 3% in the major.

Research Focus

Multimodal Large Language Models (MLLMs)

Focusing on building unified end-to-end frameworks that seamlessly integrate multimodal understanding and generation capabilities. Exploring advanced architectures like stacked autoregressive models.

Representative Works

[STAR] [UniViTAR]

Vision Foundation Models (VFMs)

Designing scalable and efficient vision transformers capable of handling native resolutions and dynamic sequences. Investigating 100% codebook utilization in vector-quantized networks.

Representative Works

[FVQ]

Multimodal Perception & Segmentation

Tackling open-vocabulary, weak-supervised, and semi-supervised image segmentation by aligning cross-modal features and discovering additional supervisions for robust visual perception.

Representative Works

[FreeSeg] [MGD] [AMR]

Generative Models & Embodied Agents

Developing LLM-driven multi-agent collaborative frameworks for complex image generation tasks and visual-guided robotic systems for real-world automated measurement and interaction.

Representative Works

[DiffusionAgent]

Selected Publications

STAR: STacked AutoRegressive Scheme for Unified Multimodal Learning

Jie Qin, Jiancheng Huang, Limeng Qiao, Lin Ma

arXiv

Proposed a 'Stacked AR Expansion + 4-Stage Progressive Training' scheme and a self-developed FullVQ discretizer, achieving leading generation/editing abilities without compromising understanding metrics.

PDF Code Project

UniViTAR: Unified Vision Transformer with Native Resolution

Limeng Qiao, Yiyang Gan, Bairui Wang, Jie Qin, Shuang Xu, Siqi Yang, Lin Ma

NeurIPS (CCF-A), 2025

Systematically upgraded ViT to construct a unified vision foundation model supporting native resolution and dynamic sequences, achieving SOTA on multiple tasks.

PDF Code

Scalable Training for Vector-Quantized Networks with 100% Codebook Utilization

Yifan Chang, Jie Qin, Limeng Qiao, Xiaofeng Wang, Zheng Zhu, Lin Ma, Xingang Wang

ICLR (CCF-A), 2026

Proposed FVQ (FullVQ) to optimize code vectors through a compress-process-recover pipeline, achieving 100% codebook utilization.

PDF Code

DiffusionAgent: Navigating Expert Models for Agentic Image Generation

Jie Qin, Jie Wu, Weifeng Chen, Yuxi Ren, Xuefeng Xiao, Rui Wang, Shilei Wen

arXiv

Designed a multi-agent collaborative generation framework driven by LLMs, implementing task decomposition, expert scheduling, and self-optimization via 'Plan-Execute-Reflect' cycles.

PDF Code Project

FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation

Jie Qin, Jie Wu, Pengxiang Yan, Ming Li, Ren Yuxi, Xuefeng Xiao, et al.

CVPR (CCF-A), 2023

Proposed a unified multimodal segmentation model fusing cross-modal features for open-vocabulary unified modeling in semantic, instance, and panoptic tasks.

PDF Code Project

Multi-Granularity Distillation Scheme Towards Lightweight Semi-Supervised Semantic Segmentation

Jie Qin, Jie Wu, Ming Li, Xuefeng Xiao, Min Zheng, Xingang Wang

ECCV (CCF-B), 2022

Proposed a multi-granularity distillation scheme (MGD) to facilitate lightweight semi-supervised semantic segmentation.

PDF Code

ResizeMix: Mixing Data while Preserving Object Information and Label Validity

Jie Qin, Jiemin Fang, Qian Zhang, Wenyu Liu, Xingang Wang, Xinggang Wang

CVMJ (IF=6.9, SCI-I), 2023

Introduced a novel data augmentation method, ResizeMix, that mixes data by resizing patches to preserve object information and label validity.

PDF

Wps-sam: Towards weakly-supervised part segmentation with foundation models

Xin-Jian Wu, Ruisong Zhang, Jie Qin, Shijie Ma, Cheng-Lin Liu

ECCV (CCF-B), 2024

Explored weakly-supervised part segmentation leveraging the prior knowledge from vision foundation models.

PDF

Activation Modulation and Recalibration Scheme for Weakly Supervised Semantic Segmentation

Jie Qin, Jie Wu, Xuefeng Xiao, Lujun Li, Xingang Wang

AAAI (CCF-A), 2022

Proposed an AMR scheme to discover additional supervision and narrow the gap between full and weak supervisions.

PDF Code

Honors & Awards

Champion, CVPR 2022 Instance Segmentation on Synthetic Data Challenge

5th Place, CVPR 2022 3rd Agriculture Vision Challenge

Merit Student, University of Chinese Academy of Sciences (2021, 2022)

Outstanding Student Leader, University of Chinese Academy of Sciences (2021)

National Encouragement Scholarship (2016, 2017)

Academic Services

Journal Reviewer

IEEE TPAMI, IEEE TIP, IJCV, TCSVT, PR

Conference Reviewer

CVPR, ICCV, ECCV, NeurIPS, ICLR, ICML, AAAI, ACM MM, IJCAI

Hi, I'm Jie Qin

Senior Algorithm Researcher

Research Interests

Latest News

Experience

Senior Algorithm Researcher

Algorithm Intern

Algorithm Intern

Education

Institute of Automation, Chinese Academy of Sciences

Nanjing University of Aeronautics and Astronautics

Research Focus

Multimodal Large Language Models (MLLMs)

Vision Foundation Models (VFMs)

Multimodal Perception & Segmentation

Generative Models & Embodied Agents

Selected Publications

STAR: STacked AutoRegressive Scheme for Unified Multimodal Learning

UniViTAR: Unified Vision Transformer with Native Resolution

Scalable Training for Vector-Quantized Networks with 100% Codebook Utilization

DiffusionAgent: Navigating Expert Models for Agentic Image Generation

FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation

Multi-Granularity Distillation Scheme Towards Lightweight Semi-Supervised Semantic Segmentation

ResizeMix: Mixing Data while Preserving Object Information and Label Validity

Wps-sam: Towards weakly-supervised part segmentation with foundation models

Activation Modulation and Recalibration Scheme for Weakly Supervised Semantic Segmentation

Honors & Awards

Academic Services

Journal Reviewer

Conference Reviewer

Hi, I'm
Jie Qin