Portrait
Bolun Zhang
B.E. Student in Intelligent Perception
Shanghai Jiao Tong University
About Me

I am an undergraduate student at Shanghai Jiao Tong University, majoring in Intelligent Perception. My research interests lie in agent evaluation.

My work focuses on building benchmarks and evaluation frameworks for LLM-based code agents. I have contributed to repository-level benchmarks such as CoreCodeBench and PRDBench, including automated benchmark construction pipelines, fine-grained task design, and agent-driven evaluation methods.

Curriculum Vitae
Education
  • Shanghai Jiao Tong University
    School of Electronic Information and Electrical Engineering
    B.E. in Intelligent Perception
    Sep. 2022 – Jun. 2026
Experience
  • Meituan (Beijing)
    Large Language Models Intern
    Dec. 2024 – Jun. 2025
  • Apex Lab, Shanghai Jiao Tong University
    Research Assistant (Large Language Models)
    Nov. 2024 – Present
Honors & Awards
  • Intel Special Award & First Prize — China-US Young Maker Competition
    2023
News
2024
Started internship at Meituan (Beijing) on large language models for code generation and AI agent pipelines.
Dec 01
Joined Apex Lab at SJTU as a Research Assistant, working on large language models and reinforcement learning.
Nov 01
2023
Received Intel Special Award & First Prize at the China-US Young Maker Competition.
Aug 01
Selected Publications (view all )
Automatically Benchmarking LLM Code Agents through Agent-Driven Annotation and Evaluation

Lingyue Fu, Bolun Zhang, Hao Guan, Yaoming Zhu, Lin Qiu, Weiwen Liu, Xuezhi Cao, Xunliang Cai, Weinan Zhang, Yong Yu

AAMAS 2026

An agent-driven pipeline for diverse project-level code agent benchmarks. Introduces PRDBench (50 Python projects, 20 domains) and a fine-tuned PRDJudge evaluator with over 90% human alignment.

Automatically Benchmarking LLM Code Agents through Agent-Driven Annotation and Evaluation

Lingyue Fu, Bolun Zhang, Hao Guan, Yaoming Zhu, Lin Qiu, Weiwen Liu, Xuezhi Cao, Xunliang Cai, Weinan Zhang, Yong Yu

AAMAS 2026

An agent-driven pipeline for diverse project-level code agent benchmarks. Introduces PRDBench (50 Python projects, 20 domains) and a fine-tuned PRDJudge evaluator with over 90% human alignment.

CoreCodeBench: Decoupling Code Intelligence via Fine-Grained Repository-Level Tasks

Lingyue Fu, Hao Guan, Bolun Zhang, Haowei Yuan, Yaoming Zhu, Jun Xu, Zongyu Wang, Lin Qiu, Xunliang Cai, Xuezhi Cao, Weiwen Liu, Weinan Zhang, Yong Yu

ACL 2026

A configurable repository-level benchmark that dissects coding capabilities via atomized tasks. Built with CorePipe, it extracts fine-grained tasks from Python repositories to evaluate LLMs beyond coarse pass rates.

CoreCodeBench: Decoupling Code Intelligence via Fine-Grained Repository-Level Tasks

Lingyue Fu, Hao Guan, Bolun Zhang, Haowei Yuan, Yaoming Zhu, Jun Xu, Zongyu Wang, Lin Qiu, Xunliang Cai, Xuezhi Cao, Weiwen Liu, Weinan Zhang, Yong Yu

ACL 2026

A configurable repository-level benchmark that dissects coding capabilities via atomized tasks. Built with CorePipe, it extracts fine-grained tasks from Python repositories to evaluate LLMs beyond coarse pass rates.

All publications