📝 Publications
🎙 Data Synthesis
Self-Foveate

Findings of ACL 2025 Self-Foveate: Enhancing Diversity and Difficulty of Synthesized Instructions from Unsupervised Text via Multi-Level Foveation, Mingzhe Li, Xin Lu, Yanyan Zhao.
- Proposes an automated LLM-driven framework named Self-Foveate for instruction synthesis from unsupervised text.
- Introduces a “Micro-Scatter-Macro” multi-level foveation methodology guiding LLMs to extract fine-grained and diverse information.
- Demonstrates superior performance across multiple unsupervised corpora and model architectures.
🎙 Multimodal
VideoThinkBench

arXiv Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm, Jingqi Tong*, Yurong Mou*, Hangcheng Li*, Mingzhe Li*, Yongzhuo Yang*, Ming Zhang, Qiguang Chen, Tianyi Liang, Xiaomeng Hu, Yining Zheng, Xinchi Chen, Jun Zhao, Xuanjing Huang, Xipeng Qiu.
- Introduces “Thinking with Video”, a new paradigm unifying visual and textual reasoning through video generation models.
- Develops VideoThinkBench, a reasoning benchmark for video generation models covering both vision-centric and text-centric tasks.
- Demonstrates that Sora-2 surpasses SOTA VLMs on several tasks.