Yi Liu (刘熠)

About Me

Now I work at Honor Device Co., Ltd. as the project leader (PL) of the On-device VLM Group, focusing on Vision-Language Models and Video Understanding. I received my Ph.D. degree at MMLab@SIAT, University of Chinese Academy of Sciences (UCAS), supervised by Prof. Yu Qiao and Prof. Yali Wang in 2024. And I was a research intern at Shanghai AI Laboratory from 2022 to 2023. I received a B.Eng. degree in Huazhong University of Science and Technology (HUST), Wuhan, China, in 2019.

Publications

MagicVL-2B: Empowering Vision-Language Models on Mobile Devices with Lightweight Visual Encoders via Curriculum Learning, arXiv, 2025 (AAAI 2026 under review, 第1作者)

MagicGen: A Universal Multimodal Data Synthesis Agent for Domain-Specific Vision-Language Model Tuning, arXiv, 2025 (In process, 第1通讯)

E-VRAG: Enhancing Long Video Understanding with Resource-Efficient Retrieval Augmented Generation, arXiv, 2025 (AAAI 2026 under review, 第1通讯)

VideoCap-R1: Enhancing MLLMs for Video Captioning via Structured Thinking, arXiv, 2025 (NeurIPS 2025 under review, 第2通讯)

LvBench: A Benchmark for Long-form Video Understanding with Versatile Multi-modal Question Answering, International Journal of Computer Vision, 2025 (IJCV, 中科院1区, IF=9.3, 共一第3)

MLLM-TA: Leveraging Multimodal Large Language Models for Precise Temporal Video Grounding, IEEE Signal Processing Letters, 2024 (SPL, 中科院2区, IF=3.9, 第1作者)

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark, Computer Vision and Pattern Recognition, 2024 (CVPR, CCF-A会议, 第6作者)

F2S-Net: Learning Frame-To-Segment Prediction for Online Action Detection, Journal of Real-Time Image Processing, 2024 (JRTIP, 中科院3区, IF=3.0, 第1作者)

Dual masked modeling for weakly-supervised temporal boundary discovery, IEEE Transactions on Multimedia, 2023 (TMM, 中科院1区, IF=9.7, 共一第2)

Learning Discriminative Feature Representation for Open Set Action Recognition, ACM International Conference on Multimedia, 2023 (ACM MM, CCF-A会议, 共一第2)

InternVideo: General Video Foundation Models via Generative and Discriminative Learning, arXiv, 2022 (SCIS under review, 第9作者)

FineAction: A Fine-Grained Video Dataset for Temporal Action Localization, IEEE Transactions on Image Processing, 2022 (TIP, 中科院1区, IF=13.7, 第1作者)

VideoPipe 2022 Challenge: Real-World Video Understanding for Urban Pipe Inspection, International Conference on Pattern Recognition, 2022 (ICPR, CCF-C会议, 第1作者)

短视频场景在线起始检测任务及方法研究, 集成技术, 2021 (共一第2)

Experience

Workshops

Student organizer of ECCV 2022 DeeperAction Challenge, Track 1: Temporal Action Localization

Student organizer of ICPR 2022 VideoPipe Challenge, Track 2: Temporal Defect Localization

Student organizer of ICCV 2021 DeeperAction Challenge, Track 1: Temporal Action Localization

1st Prize in ECCV 2022 Ego4D Episodic Memory Challenge, Moments Queries Track

1st Prize in ECCV 2022 Ego4D Episodic Memory Challenge, Looking At Me Track

Google Scholar CV

**Senior Engineer at Honor Device Co., Ltd .**

Personal Email : yiliu61richard@gmail.com

Research Interests : Vision-Language Models, Video Understanding

About Me

Publications

Experience

Workshops