📝 Publications

🎙 Data Synthesis

Self-Foveate
sym

Findings of ACL 2025 Self-Foveate: Enhancing Diversity and Difficulty of Synthesized Instructions from Unsupervised Text via Multi-Level Foveation, Mingzhe Li, Xin Lu, Yanyan Zhao.

  • Proposes an automated LLM-driven framework named Self-Foveate for instruction synthesis from unsupervised text.
  • Introduces a “Micro-Scatter-Macro” multi-level foveation methodology guiding LLMs to extract fine-grained and diverse information.
  • Demonstrates superior performance across multiple unsupervised corpora and model architectures.

🎙 Multimodal

VideoThinkBench
sym

arXiv Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm, Jingqi Tong*, Yurong Mou*, Hangcheng Li*, Mingzhe Li*, Yongzhuo Yang*, Ming Zhang, Qiguang Chen, Tianyi Liang, Xiaomeng Hu, Yining Zheng, Xinchi Chen, Jun Zhao, Xuanjing Huang, Xipeng Qiu.

  • Introduces “Thinking with Video”, a new paradigm unifying visual and textual reasoning through video generation models.
  • Develops VideoThinkBench, a reasoning benchmark for video generation models covering both vision-centric and text-centric tasks.
  • Demonstrates that Sora-2 surpasses SOTA VLMs on several tasks.