Publications
♠ Innovator
AI Can Learn Scientific Taste, Jingqi Tong*, Mingzhe Li*, Hangcheng Li*, Yongzhuo Yang, Yurong Mou, Weijie Ma, Zhiheng Xi, Hongji Chen, Xiaoran Liu, Qinyuan Cheng, Ming Zhang, Qiguang Chen, Weifeng Ge, Qipeng Guo, Tianlei Ying, Tianxiang Sun, Yining Zheng, Xinchi Chen, Jun Zhao, Ning Ding, Xuanjing Huang, Yugang Jiang, Xipeng Qiu.
- Introduces Reinforcement Learning from Community Feedback (RLCF), leveraging large-scale citation data as training signals for scientific evaluation.
- Develops Scientific Judge (trained on 700K paper pairs) and Scientific Thinker for generating high-impact research proposals.
♠ Data Synthesis
Self-Foveate: Enhancing Diversity and Difficulty of Synthesized Instructions from Unsupervised Text via Multi-Level Foveation, Mingzhe Li†, Xin Lu, Yanyan Zhao.
- Proposes an automated LLM-driven framework named Self-Foveate for instruction synthesis from unsupervised text.
- Introduces a “Micro-Scatter-Macro” multi-level foveation methodology guiding LLMs to extract fine-grained and diverse information.
- Demonstrates superior performance across multiple unsupervised corpora and model architectures.
♠ Multimodal
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm, Jingqi Tong*, Yurong Mou*, Hangcheng Li*, Mingzhe Li*, Yongzhuo Yang*, Ming Zhang, Qiguang Chen, Tianyi Liang, Xiaomeng Hu, Yining Zheng, Xinchi Chen, Jun Zhao, Xuanjing Huang, Xipeng Qiu.
- Introduces “Thinking with Video”, a new paradigm unifying visual and textual reasoning through video generation models.
- Develops VideoThinkBench, a reasoning benchmark for video generation models covering both vision-centric and text-centric tasks.
- Demonstrates that Sora-2 surpasses SOTA VLMs on several tasks.
♠ LLM Safety
STAR-S: Improving Safety Alignment through Self-Taught Reasoning on Safety Rules, Di Wu, Yanyan Zhao, Xin Lu, Mingzhe Li, Bing Qin.
- Proposes STAR-S (Self-TAught Reasoning based on Safety rules), a framework that integrates safety rule reasoning learning into a self-taught loop.
- Introduces a synergistic cycle where models reason and reflect on safety rules, then use fine-tuning to enhance safety reasoning capabilities.
- Demonstrates superior defense against jailbreak attacks compared to baseline models through iterative improvement of safety rule understanding.



