Biography

I am a Ph.D. student jointly at the College of Computer Science and Artificial Intelligence, Fudan University, and the Shanghai Innovation Institute, advised by Prof. Xipeng Qiu. Before that, I received my B.Eng. degree in Artificial Intelligence from Harbin Institute of Technology, where I worked with Prof. Yanyan Zhao.

I am broadly interested in Natural Language Processing and Machine Learning. My current research focuses on Reinforcement Learning, Self-Evolving Agent and Synthetic Data Generation. I am particularly interested in leveraging reinforcement learning and its derivative techniques to stimulate the self-evolving capabilities of LLM-based agents in real-world environments.

If you are interested in any form of academic collaboration, please feel free to email me at mzli@ir.hit.edu.cn.

News

2026.03

We released AI Can Learn Scientific Taste, which ranked #1 on the Hugging Face Daily Papers monthly leaderboard.

2026.03

One paper “Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm” is accepted by CVPR 2026.

2025.08

Started my internship at Shanghai Innovation Institute.

2025.03

One paper “Self-Foveate: Enhancing Diversity and Difficulty of Synthesized Instructions from Unsupervised Text via Multi-Level Foveation” is accepted by Findings of ACL 2025.

Publications

^* Equal contribution ^† Project lead The leading papers are highlighted.

arXiv 2026

AI Can Learn Scientific Taste

Jingqi Tong^*, Mingzhe Li^*, Hangcheng Li^*, Yongzhuo Yang, Yurong Mou, Weijie Ma, Zhiheng Xi, Hongji Chen, Xiaoran Liu, Qinyuan Cheng, Ming Zhang, Qiguang Chen, Weifeng Ge, Qipeng Guo, Tianlei Ying, Tianxiang Sun, Yining Zheng, Xinchi Chen, Jun Zhao, Ning Ding, Xuanjing Huang, Yugang Jiang, Xipeng Qiu.

Introduces Reinforcement Learning from Community Feedback (RLCF), using 720K field- and time-matched citation pairs to learn community preferences for high-impact research.
Trains Scientific Judge and Scientific Thinker for impact assessment and ideation; the 30B judge reaches 82.7% accuracy, while the thinker outperforms baselines on idea potential.

ACL (Findings) 2025

Self-Foveate: Enhancing Diversity and Difficulty of Synthesized Instructions from Unsupervised Text via Multi-Level Foveation

Mingzhe Li^†, Xin Lu, Yanyan Zhao.

Introduces Self-Foveate, a “Micro-Scatter-Macro” pipeline that extracts fine-grained details, cross-region relations, and holistic patterns from unsupervised text.
Adds iterative re-synthesis to improve source fidelity; experiments across diverse settings show consistent gains in instruction diversity, difficulty, and downstream performance.

CVPR 2026

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Jingqi Tong^*, Yurong Mou^*, Hangcheng Li^*, Mingzhe Li^*, Yongzhuo Yang^*, Ming Zhang, Qiguang Chen, Tianyi Liang, Xiaomeng Hu, Yining Zheng, Xinchi Chen, Jun Zhao, Xuanjing Huang, Xipeng Qiu.

Proposes “Thinking with Video” and VideoThinkBench, using generated video frames as a shared medium for dynamic visual and text-centric multimodal reasoning.
Shows Sora-2 is competitive with leading VLMs on visual tasks and reaches 92.0% on MATH and 69.2% on MMMU; few-shot learning and self-consistency further improve reasoning.

arXiv 2026

STAR-S: Improving Safety Alignment through Self-Taught Reasoning on Safety Rules

Di Wu, Yanyan Zhao, Xin Lu, Mingzhe Li, Bing Qin.

Proposes STAR-S, an iterative self-taught loop that elicits safety-rule reasoning, repairs failures through guided reflection, and fine-tunes on the resulting traces.
Across six jailbreak and two over-refusal benchmarks, STAR-S outperforms safety-alignment baselines while balancing over-refusal and preserving general capabilities.

Honors and Awards

2026.06: Outstanding Graduate of Heilongjiang Province.
2026.06: Outstanding Graduate of Harbin Institute of Technology.
2026.06: Outstanding Undergraduate Thesis of Harbin Institute of Technology (Top 100 Outstanding Theses).
2025.01: Top Ten Outstanding Learning Stars of Harbin Institute of Technology (nominee).
2023.12: Outstanding Student of Heilongjiang Province.

Educations

2026.09 - 2031.07 (Expected), Ph.D. student in Computer Science and Technology, Fudan University (Jointly with Shanghai Innovation Institute), Shanghai.
2022.08 - 2026.07, B.S. in Artificial Intelligence, Harbin Institute of Technology, Harbin.

Academic Services

Conference Reviewer: ICLR, ACL, CVPR.

Mingzhe Li / 李明哲