Mingzhe Li / 李明哲

Focusing on Reinforcement Learning, Self-Evolving Agents and Innovators.

Publications

* Equal Contribution † Project Lead

♠ Innovator

Coming Soon...

♠ Data Synthesis

ACL (Findings) 2025

Self-Foveate

Self-Foveate: Enhancing Diversity and Difficulty of Synthesized Instructions from Unsupervised Text via Multi-Level Foveation, Mingzhe Li^†, Xin Lu, Yanyan Zhao.

Proposes an automated LLM-driven framework named Self-Foveate for instruction synthesis from unsupervised text.
Introduces a “Micro-Scatter-Macro” multi-level foveation methodology guiding LLMs to extract fine-grained and diverse information.
Demonstrates superior performance across multiple unsupervised corpora and model architectures.

♠ Multimodal

arXiv 2025

Thinking with Video

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm, Jingqi Tong^*, Yurong Mou^*, Hangcheng Li^*, Mingzhe Li^*, Yongzhuo Yang^*, Ming Zhang, Qiguang Chen, Tianyi Liang, Xiaomeng Hu, Yining Zheng, Xinchi Chen, Jun Zhao, Xuanjing Huang, Xipeng Qiu.

Introduces “Thinking with Video”, a new paradigm unifying visual and textual reasoning through video generation models.
Develops VideoThinkBench, a reasoning benchmark for video generation models covering both vision-centric and text-centric tasks.
Demonstrates that Sora-2 surpasses SOTA VLMs on several tasks.

♠ LLM Safety

arXiv 2026

STAR-S: Improving Safety Alignment through Self-Taught Reasoning on Safety Rules, Di Wu, Yanyan Zhao, Xin Lu, Mingzhe Li, Bing Qin.

Proposes STAR-S (Self-TAught Reasoning based on Safety rules), a framework that integrates safety rule reasoning learning into a self-taught loop.
Introduces a synergistic cycle where models reason and reflect on safety rules, then use fine-tuning to enhance safety reasoning capabilities.
Demonstrates superior defense against jailbreak attacks compared to baseline models through iterative improvement of safety rule understanding.