Hongzheng Chen

I am a Ph.D. candidate in the Computer Systems Laboratory at Cornell University, advised by Prof. Zhiru Zhang. I received my B.E. in Computer Science with highest honors from Sun Yat-sen University.

I have been fortunate to intern at Google, NVIDIA, AWS, and ByteDance, where I worked on machine learning systems. My research is supported by the SRC JUMP 2.0 ACE Center and has received recognition through three Best Paper nominations and a Best Paper Award at top-tier hardware conferences. I was also named a 2024 ML and Systems Rising Star and selected as a finalist for the 2025 Qualcomm Innovation Fellowship.

Research Highlights

My research focuses on building compilers, programming systems, and accelerators for large-scale machine learning workloads, with an emphasis on large language models (LLMs). In particular, I attempt to bridge the productivity-performance gap between emerging machine learning applications (e.g., generative AI) and heterogeneous accelerators.

^★ indicates projects where I am the project lead

Accelerator Programming Frameworks:

AutoWS^★ introduces automatic warp specialization to discover efficient LLM kernels such as FlashAttention on NVIDIA Hopper and Blackwell GPUs. The proposed NVWS dialect has been merged upstream into OpenAI Triton.
Slapo [ASPLOS’24]^★ is a distributed LLM training framework deployed at AWS, designed to balance usability and performance. It has influenced the design of ByteDance’s veScale and Meta’s TorchTitan.
BGL [NSDI’23] is a production-scale GNN training framework used at ByteDance, reducing billion-node graph training time from weeks to days.

Accelerator Design Languages: Allo [PLDI’24]^★ is a Python/MLIR-based programming language for efficient ML accelerator design. It is adopted by 10+ universities and companiesCornell, UCLA, UIUC, Brown, UofT, UVA, UofChiago, UIC, Imperial, Tsinghua, SJTU, Intel, AMD, Microsoft. Allo integrates multiple supporting tools, including the PEQC [FPGA’24, 🏆 Best Paper Award] equivalence checker, the HeteroFlow [FPGA’22] dataflow programming framework, and the ARIES [FPGA’25, Best Paper Nominee] backend for AMD AI Engine.

Accelerator Architectures for ML:

LLM-FPGA [FCCM’24]^★ pioneers FPGA-based dataflow accelerator design for LLMs.
FracBNN [FPGA’21, Best Paper Nominee] is an efficient FPGA accelerator for quantized CNNs.

ML for Systems:

FunCompiler^★ is an LLM agentic framework that leverages AlphaEvolve/FunSearch for compiler optimization.
HeuriGym [arxiv’25]^★ is an agentic benchmarking framework for evaluating LLMs’ reasoning capabilities for scientific and engineering optimization problems.
DRL-Scheduler [ICCAD’19]^★ pioneers the use of deep reinforcement learning in EDA.

News

[5/5/25] [Talk] I will visit University of Edinburgh and give a talk on Allo on May 19. Thanks Jianyi for inviting me!
[4/7/25] [Talk] I will give a talk on Allo at the Jane Street Xcelerate Colloquium (JSXC) on May 16 in New York City.
[3/6/25] [Internship] I will join Google as a student researcher working on LLM for compilers in Sunnyvale, CA during this summer. See you in California!
[2/28/25] [Award] Our AREIS paper has been nominated as one of the Best Paper Candidates at FPGA’25. Congrats to the team!
[2/25/25] [Talk] Our presentation proposal on Allo has been accepted for the 2nd FPGA Developers’ Forum (FDF) at CERN.
[2/24/25] [Award] Niansong and I have been selected as Finalists for the 2025 Qualcomm Innovation Fellowship!
[2/17/25] [Service] Served as a reviewer of NeurIPS’25.
[1/26/25] [Talk] Niansong and I will give a talk on Allo at Mengjia’s Lab @ MIT. Thanks for the invitation!
[1/7/25] [Talk] I will give a guest lecture on Allo in ECE 8893 – Parallel Programming for FPGAs at Georgia Tech. Many thanks to Callie for the kind invitation!
[12/31/24] [Talk] Our presentation proposal on Allo has been accepted to C4ML’25@CGO. Let’s meet in Las Vegas!
[12/24/24] [Tutorial] Our tutorial proposal on Allo has been accepted to FPGA’25. We warmly invite you to join us in Monterey in late February to discuss and collaborate!
[12/12/24] [Service] Served as a reviewer of ICML’25.
[11/29/24] [Paper] Our paper on AIE programming has been accepted to FPGA’25! Congrats to all the coauthors!
[11/20/24] [Service] Served as an external reviewer of MLSys’25 and joined the OOPSLA’25 Artifact Evaluation Committee.
[10/16/24] [Talk] I passed the Examination for Admission to Candidacy (A Exam) and became a PhD candidate! Thanks for all the support!
[10/01/24] [Talk] Niansong and I will attend the annual review of the SRC JUMP 2.0 ACE Center in Chicago from Oct 1 to Oct 3 and give a presentation on Allo. See you there!
[08/24/24] [Service] Served as a reviewer of ICLR’25.
[08/22/24] [Talk] I gave a final presentation for my internship project on Automatic Warp Specialization for Hopper Architecture at NVIDIA. I will continue working on it as a part-time intern until November.
[07/01/24] [Talk] I will be attending the 2024 MLSys Rising Star workshop at the NVIDIA Headquarter in Santa Clara, CA from July 15 to July 16. See you in the Bay Area!
[06/27/24] [Award] I received 3rd place in the ACM SIGPLAN PLDI Student Research Competition (SRC).
[06/10/24] [Talk] I will give a talk on Slapo for distributed model training at ByteDance on Jun 14. Thanks Youjie for inviting me!
[05/16/24] [Award] I am selected as one of the ML and Systems Rising Stars! Thanks for all the support!
[05/11/24] [Talk] Received the PLDI’24 Travel Grant. I will present our work on Allo in Copenhagen, Denmark at the end of June. Please come to find me if you are around!
[05/10/24] [Talk] I will give a talk on LLM acceleration with Allo at UW SAMPL group on May 31. Thanks Keisuke for inviting me!
[04/30/24] [Paper] My paper on schedule reconstruction has been accepted by PLDI’24 Student Research Competition (SRC)! I’ll present the poster at the PLDI conference in June.
[04/28/24] [Service] Joined OSDI’24/ATC’24 Artifact Evaluation Committee.
[04/08/24] [Talk] I will give a talk at the UIUC AMD-Xilinx Center of Excellence (HACC) seminar on Apr 10. Thanks Deming for inviting me!
[04/05/24] [Talk] Received the FCCM’24 Travel Grant. We will demonstrate Allo at the Demo Night. See you in Orlando at the beginning of May!
[03/31/24] [Paper] Our Allo paper has been fully accepted to PLDI’24! Code is open-source.
[03/20/24] [Paper] Our LLM-FPGA paper has been accepted to FCCM’24 journal track and will be published in TRETS!
[03/14/24] [Travel] Received the ASPLOS’24 Travel Grant. See you in San Diego at the end of April!
[03/05/24] [Award] Our HLS verification paper has received the Best Paper Award at FPGA’24!
[02/27/24] [Paper] Our Allo paper has been conditionally accepted to PLDI’24! Code is open-source.
[02/21/24] [Travel] I will be attending FPGA’24 from Mar 2-6 in Monterey, CA. Feel free to reach out if you want to chat!
[01/15/24] [Service] Joined PLDI’24 Artifact Evaluation Committee.
[12/27/23] [Internship] Received the internship offer from NVIDIA! I’ll join the NVIDIA deep learning compiler team in 2024 Summer.
[12/10/23] [Paper] Our HLS verification paper has been accepted to FPGA’24. Congrats to all the coauthors!
[11/07/23] [Paper] Our Slapo paper has been accepted to ASPLOS’24! Code is open-source.
[10/30/23] [Service] Joined OOPSLA’24 Artifact Evaluation Committee.
[09/12/23] [Talk] Attended SRC TECHCON at Austin and gave a talk on decoupled model schedule.

Publications

ARIES: An Agile MLIR-Based Compilation Flow for Reconfigurable Devices with AI Engines
Jinming Zhuang*, Shaojie Xiang*, Hongzheng Chen, Niansong Zhang, Zhuoping Yang, Tony Mao, Zhiru Zhang, Peipei Zhou
FPGAACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2025 (Best Paper Nominee) | [abs] | [bib] |

Allo: A Programming Model for Composable Accelerator Design
Hongzheng Chen*, Niansong Zhang*, Shaojie Xiang, Zhichen Zeng, Mengjia Dai, Zhiru Zhang
PLDIACM SIGPLAN Conference on Programming Language Design and Implementation, 2024 | [abs] | [bib] | | | Blog (Zhihu)

Understanding the Potential of FPGA-Based Spatial Acceleration for Large Language Model Inference
Hongzheng Chen, Jiahao Zhang, Yixiao Du, Shaojie Xiang, Zichao Yue, Niansong Zhang, Yaohui Cai, Zhiru Zhang
ACM TRETSACM Transactions on Reconfigurable Technology and Systems, 2024 (FCCMIEEE International Symposium on Field-Programmable Custom Computing Machines‘24 Journal Track) | [abs] | [bib] | | Blog (Zhihu)

Slapo: A Schedule Language for Progressive Optimization of Large Deep Learning Model Training
Hongzheng Chen, Cody Hao Yu, Shuai Zheng, Zhen Zhang, Zhiru Zhang, Yida Wang
ASPLOSACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024 | [abs] | [bib] | | | Amazon Science

🏆 Formal Verification of Source-to-Source Transformations for HLS
Louis-Noël Pouchet, Emily Tucker, Niansong Zhang, Hongzheng Chen, Debjit Pal, Gabriel Rodríguez, Zhiru Zhang
FPGAACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2024 (Best Paper Award) | [abs] | [bib] |

BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing
Tianfeng Liu*, Yangrui Chen*, Dan Li, Chuan Wu, Yibo Zhu, Jun He, Yanghua Peng, Hongzheng Chen, Hongzhi Chen, Chuanxiong Guo
NSDIUSENIX Symposium on Networked Systems Design and Implementation, 2023 | [abs] | [bib] |

Accelerator Design with Decoupled Hardware Customizations: Benefits and Challenges
Debjit Pal, Yi-Hsiang Lai, Shaojie Xiang, Niansong Zhang, Hongzheng Chen, Jeremy Casas, Pasquale Cocchini, Zhenkun Yang, Jin Yang, Louis-Noël Pouchet, Zhiru Zhang
DACACM/IEEE Design Automation Conference, 2022 (Invited Paper) | [abs] | [bib]

HeteroFlow: An Accelerator Programming Model with Decoupled Data Placement for Software-Defined FPGAs
Shaojie Xiang, Yi-Hsiang Lai, Yuan Zhou, Hongzheng Chen, Niansong Zhang, Debjit Pal, Zhiru Zhang
FPGAACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2022 | [abs] | [bib] |

Krill: A Compiler and Runtime System for Concurrent Graph Processing
Hongzheng Chen, Minghua Shen, Nong Xiao, Yutong Lu
SCInternational Conference for High Performance Computing, Networking, Storage and Analysis, 2021 | [abs] | [bib] | |

FracBNN: Accurate and FPGA-Efficient Binary Neural Networks with Fractional Activations
Yichi Zhang, Junhao Pan, Xinheng Liu, Hongzheng Chen, Deming Chen, Zhiru Zhang
FPGAACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2021 (Best Paper Nominee) | [abs] | [bib] |

Entropy-Directed Scheduling for FPGA High-Level Synthesis
Minghua Shen, Hongzheng Chen*, Nong Xiao
IEEE TCADIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2020 | [abs] | [bib] |

A Deep-Reinforcement-Learning-Based Scheduler for FPGA HLS
Hongzheng Chen, Minghua Shen
ICCADIEEE/ACM International Conference on Computer-Aided Design, 2019 | [abs] | [bib] |

Workshops / Preprints

HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial Optimization
Hongzheng Chen*, Yingheng Wang*, Yaohui Cai*, Hins Hu*, Jiajie Li*, Shirley Huang, Chenhui Deng, Rongjian Liang, Shufeng Kong, Haoxing Ren, Samitha Samaranayake, Carla P. Gomes, Zhiru Zhang
arXiv:2506.07972, 2025 | [abs] | [bib] |

Allo: Catalyzing Accelerator Design and Programming for Machine Learning
Hongzheng Chen, Niansong Zhang, Shaojie Xiang, Zhiru Zhang
C4ML@CGOCompilers for Machine Learning Workshop at International Symposium on Code Generation and Optimization, 2025 | [abs] | [bib] |

🥉 Uncovering Magic with Magic: Schedule Reconstruction from High-Performance Kernel Libraries
Hongzheng Chen
PLDI Student Research Competition (SRC)ACM SIGPLAN Conference on Programming Language Design and Implementation Student Research Competition, 2024 (Bronze) | [abs] | [bib] |

Structured Pruning is All You Need for Pruning CNNs at Initialization
Yaohui Cai, Weizhe Hua, Hongzheng Chen, G. Edward Suh, Christopher De Sa, Zhiru Zhang
arXiv:2203.02549, 2022 | [abs] | [bib]

Education

Cornell University, US Ph.D. in Computer Science	Aug. 2021 - Present
Thesis: Composable Programming Models for Accelerated Computing Committee: Zhiru Zhang, Adrian Sampson, Mohamed Abdelfattah Accumulated GPA: 4.0/4.0
Cornell University, US M.S. in Computer Science	Aug. 2021 - Dec. 2024

Sun Yat-sen University, China B.E. in Computer Science	Aug. 2017 - Jun. 2021
Overall GPA: 3.95/4.00 (Major GPA: 3.99/4.00) Ranking: 1/188

Work Experience

Google , Sunnyvale, CA, US Student Researcher, Compiler Team Mentors: Mircea Trofin and Amir Yazdanbakhsh	May 2025 - Aug. 2025
NVIDIA , Redmond, WA, US Machine Learning Compiler Research Intern, Deep Learning Compiler Technology Team Mentors: Bin Fan and Vinod Grover	May 2024 - Nov. 2024
Amazon Web Services (AWS) , Santa Clara, CA, US Applied Scientist Intern, Deep Engine-Science Team Mentors: Cody Hao Yu, Shuai Zheng, and Yida Wang	Aug. 2022 - Apr. 2023
ByteDance AI Lab , Beijing, China Research Intern, MLSys Team, Applied Machine Learning (AML) Mentors: Jun He and Yibo Zhu	Aug. 2020 - May 2021

Scholarship

Qualcomm Innovation Fellowship Finalist, Qualcomm, 2025
SenseTime Scholarship (21 undergrads in China), SenseTime, 2020
Chinese National Scholarship $\times$ 2 (Top 1%), Ministry of Education of PRC, 2018-2020
First-Prize Scholarship $\times$ 3 (Top 5%), Sun Yat-sen University, 2017-2020
Samsung Scholarship (Top 1%), Samsung Electronics, 2017-2018

Talks

Allo: A Programming Model for Composable Accelerator Design
- 2nd FPGA Developers' Forum (FDF) @ CERN, Geneva, Switzerland, May 21, 2025
- ICSA @ University of Edinburgh, Edinburgh, UK, May 19, 2025
- Jane Street Xcelerate Colloquium, New York, May 16, 2025
- ACE Spring Meeting, Online, May 15, 2025
- CSL Retreat @ Cornell Tech, New York, NY, May 7, 2025
- ECE 8893 – Parallel Programming for FPGAs @ Georgia Tech (Guest lecture), Online, Mar 11, 2025
- C4ML workshop @ CGO'25, Las Vegas, NV, Mar 2, 2025
- FPGA'25, Monterey, CA, Mar 1, 2025
- CSAIL @ MIT, Online, Feb 19, 2025
- Google ML Compiler Systems Research Team, Online, Dec 12, 2024
- ACE Annual Review, Chicago, IL, Oct 1, 2024
- PLDI'24, Copenhagen, Denmark, Jun 28, 2024
- NVIDIA DL Compiler, Redmond, WA, May 29, 2024
- FCCM'24 Demo Night, Orlando, FL, May 6, 2024
Automatic Warp Specialization for Hopper Architecture
- NVIDIA DL Compiler, Redmond, WA, Aug 21, 2024
Accelerating Large Language Model Inference on FPGA with Allo
- UW SAMPL, Seattle, WA, May 31, 2024
- UIUC AMD-Xilinx Center of Excellence (HACC), Online, Apr 10, 2024
Understanding the Potential of FPGA-Based Spatial Acceleration for Large Language Model Inference
- FCCM'24 , Orlando, FL, May 7, 2024
Slapo: A Schedule Language for Progressive Optimization of Large Deep Learning Model Training
- ByteDance AML, Bellevue, WA, Jun 14, 2024
- ASPLOS'24, San Diego, CA, May 1, 2024
- ACE Liaison Meeting, Online, Mar 5, 2024
- SRC TECHCON, Austin, TX, Sep 12, 2023
- ADAPT Lab @ UIUC, Online, Jul 17, 2023
- Spring ACE Center Meeting, Orlando, FL, Jun 22, 2023
- CSL Retreat @ Cornell, Ithaca, NY, May 12, 2023
- Pre-NSDI Systems Gathering @ BU, Boston, MA, Apr 16, 2023
- Amazon AI, Online, Apr 10, 2023
- TVMCon, Online, Mar 17, 2023
An MLIR-Based Intermediate Representation for Accelerator Design with Decoupled Hardware Customizations
- CRISP Liaison Meeting, Online, Sep 28, 2022
- MLIR Open Design Meeting, Online, Aug 11, 2022
Krill: A Compiler and Runtime System for Concurrent Graph Processing
- SC'21, St. Louis, MO, Nov 17, 2021
A Deep-Reinforcement-Learning-Based Scheduler for FPGA HLS
- ICCAD'19, Denver, CO, Nov 5, 2019

Hongzheng Chen 「陈鸿峥」