I am an undergraduate student at Shanghai Jiao Tong University, majoring in Intelligent Perception. My research interests lie in agent evaluation.
My work focuses on building benchmarks and evaluation frameworks for LLM-based code agents. I have contributed to repository-level benchmarks such as CoreCodeBench and PRDBench, including automated benchmark construction pipelines, fine-grained task design, and agent-driven evaluation methods.
Lingyue Fu, Bolun Zhang, Hao Guan, Yaoming Zhu, Lin Qiu, Weiwen Liu, Xuezhi Cao, Xunliang Cai, Weinan Zhang, Yong Yu
AAMAS 2026
An agent-driven pipeline for diverse project-level code agent benchmarks. Introduces PRDBench (50 Python projects, 20 domains) and a fine-tuned PRDJudge evaluator with over 90% human alignment.
Lingyue Fu, Bolun Zhang, Hao Guan, Yaoming Zhu, Lin Qiu, Weiwen Liu, Xuezhi Cao, Xunliang Cai, Weinan Zhang, Yong Yu
AAMAS 2026
An agent-driven pipeline for diverse project-level code agent benchmarks. Introduces PRDBench (50 Python projects, 20 domains) and a fine-tuned PRDJudge evaluator with over 90% human alignment.
Lingyue Fu, Hao Guan, Bolun Zhang, Haowei Yuan, Yaoming Zhu, Jun Xu, Zongyu Wang, Lin Qiu, Xunliang Cai, Xuezhi Cao, Weiwen Liu, Weinan Zhang, Yong Yu
ACL 2026
A configurable repository-level benchmark that dissects coding capabilities via atomized tasks. Built with CorePipe, it extracts fine-grained tasks from Python repositories to evaluate LLMs beyond coarse pass rates.
Lingyue Fu, Hao Guan, Bolun Zhang, Haowei Yuan, Yaoming Zhu, Jun Xu, Zongyu Wang, Lin Qiu, Xunliang Cai, Xuezhi Cao, Weiwen Liu, Weinan Zhang, Yong Yu
ACL 2026
A configurable repository-level benchmark that dissects coding capabilities via atomized tasks. Built with CorePipe, it extracts fine-grained tasks from Python repositories to evaluate LLMs beyond coarse pass rates.