Homepage - Bolun Zhang

Bolun Zhang

B.E. Student in Intelligent Perception
Shanghai Jiao Tong University

bolun_zhang(at)sjtu.edu.cn

About Me

I am an undergraduate student at Shanghai Jiao Tong University, majoring in Intelligent Perception. My research interests lie in agent evaluation.

My work focuses on building benchmarks and evaluation frameworks for LLM-based code agents. I have contributed to repository-level benchmarks such as CoreCodeBench and PRDBench, including automated benchmark construction pipelines, fine-grained task design, and agent-driven evaluation methods.

Curriculum Vitae

Education

Shanghai Jiao Tong University

School of Electronic Information and Electrical Engineering
B.E. in Intelligent Perception

Sep. 2022 – Jun. 2026

Experience

Meituan (Beijing)

Large Language Models Intern

Dec. 2024 – Jun. 2025
Apex Lab, Shanghai Jiao Tong University

Research Assistant (Large Language Models)

Nov. 2024 – Present

Honors & Awards

Intel Special Award & First Prize — China-US Young Maker Competition

2023

News

2024

Started internship at Meituan (Beijing) on large language models for code generation and AI agent pipelines.

Dec 01

Joined Apex Lab at SJTU as a Research Assistant, working on large language models and reinforcement learning.

Nov 01

2023

Received Intel Special Award & First Prize at the China-US Young Maker Competition.

Aug 01

Selected Publications (view all )

Automatically Benchmarking LLM Code Agents through Agent-Driven Annotation and Evaluation

Lingyue Fu, Bolun Zhang, Hao Guan, Yaoming Zhu, Lin Qiu, Weiwen Liu, Xuezhi Cao, Xunliang Cai, Weinan Zhang, Yong Yu

AAMAS 2026

An agent-driven pipeline for diverse project-level code agent benchmarks. Introduces PRDBench (50 Python projects, 20 domains) and a fine-tuned PRDJudge evaluator with over 90% human alignment.

[Paper]

Automatically Benchmarking LLM Code Agents through Agent-Driven Annotation and Evaluation

Lingyue Fu, Bolun Zhang, Hao Guan, Yaoming Zhu, Lin Qiu, Weiwen Liu, Xuezhi Cao, Xunliang Cai, Weinan Zhang, Yong Yu

AAMAS 2026

An agent-driven pipeline for diverse project-level code agent benchmarks. Introduces PRDBench (50 Python projects, 20 domains) and a fine-tuned PRDJudge evaluator with over 90% human alignment.

[Paper]

CoreCodeBench: Decoupling Code Intelligence via Fine-Grained Repository-Level Tasks

Lingyue Fu, Hao Guan, Bolun Zhang, Haowei Yuan, Yaoming Zhu, Jun Xu, Zongyu Wang, Lin Qiu, Xunliang Cai, Xuezhi Cao, Weiwen Liu, Weinan Zhang, Yong Yu

ACL 2026

A configurable repository-level benchmark that dissects coding capabilities via atomized tasks. Built with CorePipe, it extracts fine-grained tasks from Python repositories to evaluate LLMs beyond coarse pass rates.

[Paper]

CoreCodeBench: Decoupling Code Intelligence via Fine-Grained Repository-Level Tasks

Lingyue Fu, Hao Guan, Bolun Zhang, Haowei Yuan, Yaoming Zhu, Jun Xu, Zongyu Wang, Lin Qiu, Xunliang Cai, Xuezhi Cao, Weiwen Liu, Weinan Zhang, Yong Yu

ACL 2026

[Paper]

Education

Experience

Honors & Awards

News

Selected Publications (view all )

Automatically Benchmarking LLM Code Agents through Agent-Driven Annotation and Evaluation

Automatically Benchmarking LLM Code Agents through Agent-Driven Annotation and Evaluation

CoreCodeBench: Decoupling Code Intelligence via Fine-Grained Repository-Level Tasks

CoreCodeBench: Decoupling Code Intelligence via Fine-Grained Repository-Level Tasks

All publications