I am a Research Scientist at ByteDance Seed, working on LLM infrastructure, heterogeneous accelerators, and agentic frameworks. I received my Ph.D. from the CS department at Cornell University, advised by Prof. Zhiru Zhang, and my B.E. in Computer Science with highest honors from Sun Yat-sen University.

I have been fortunate to intern at Google DeepMind, NVIDIA, AWS, and ByteDance, contributing to projects in large-scaleSlapo [ASPLOS’24] with AWS machineBGL [NSDI’23] with ByteDance learningMagellan [C4ML @ CGO’26] with Google DeepMind systemsTawa [CGO’26] with NVIDIA. I have also closely collaborated with three major hardware vendors Intel, AMD, and NVIDIA on variousTawa [CGO’26] with NVIDIA compilerAllo [PLDI’24] with AMD projectsHeteroCL-MLIR [DAC’22] with Intel. My research has been recognized with three Best Paper nominations and a Best Paper Award at top-tier hardware conferences. I was also named a 2024 ML and Systems Rising Star.

Research Highlights

My research focuses on building compilers, programming languages, and accelerators for large-scale machine learning workloads, with an emphasis on large language models (LLMs). In particular, I aim to build performant and scalable systems that enable programmers to harness heterogeneous hardware (GPUs/TPUs/NPUs) for emerging machine learning applications (e.g., GenAI) in a more productive way.

indicates projects where I am the project lead

Accelerator Programming Frameworks:

  • Tawa [CGO’26] first introduces automatic warp specialization to generate efficient LLM kernels such as FlashAttention-3/4 on NVIDIA Hopper and Blackwell GPUs. The proposed NVWS dialect has been merged upstream into OpenAI Triton.
  • Slapo [ASPLOS’24] is a distributed LLM pre-training framework deployed at AWS, designed to balance usability and performance. It has influenced the design of ByteDance’s veScale and Meta’s TorchTitan.
  • BGL [NSDI’23] is a production-scale GNN training framework used at ByteDance, reducing billion-node graph training time from weeks to days.

Accelerator Design Languages:

Accelerator Architectures for ML:

ML for Systems:

News

  • [3/22/26] [Event] I will attend the ASPLOS’26 conference and attend the LATTE workshop in Pittsburgh, PA. Feel free to reach out if you are around!
  • [3/18/26] [Defense] I have successfully defended my PhD dissertation! Thanks for all the support from my advisors, collaborators, and friends!
  • [1/26/26] [Paper] Our paper on HeuriGym, an agentic benchmark for LLM-crafted heuristics in combinatorial optimization, has been accepted to ICLR’26! Congrats to all the coauthors!
  • [1/23/26] [Talk] I will give a guest lecture in ENSC453: Programming for Heterogeneous Computing at Simon Fraser University. Many thanks to Yingjie for the invitation!
  • [1/9/26] [Talk] I will give a guest lecture in ECE 8893 – Parallel Programming for FPGAs at Georgia Tech. Many thanks to Callie for the invitation again!
Show older newsShow less
  • [12/13/25] [Paper] Our presentation proposal on Magellan, an agentic framework for autonomous discovery of novel compiler optimization heuristics, has been accepted to the C4ML workshop @ CGO’26 in January 2026.
  • [12/1/25] [Event] I will attend the NeurIPS’25 conference in San Diego, CA from Dec 1 to Dec 7 and present our work on HeuriGym at the Math-AI workshop. Feel free to reach out if you are around!
  • [11/25/25] [Talk] I will give a guest lecture in COMP 468/568 – Deep Learning Systems at Rice University. Many thanks to Yuke for the kind invitation!
  • [11/22/25] [Tutorial] Our tutorial proposal on “RAIC: Reconfigurable AI Computing” has been accepted to ASPLOS’26. Thanks Jianming for leading the effort! We warmly invite you to join us in Pittsburgh in late March to discuss and collaborate!
  • [11/20/25] [Paper] Our paper on automatic warp specialization for NVIDIA GPUs has been accepted to CGO’26! Congrats to all the coauthors!
  • [11/18/25] [Service] Served as an external reviewer of MLSys’26.
  • [10/30/25] [Event] I will attend the first Meta PhD Forum at Meta headquater. Looking forward to connecting!
  • [10/29/25] [Talk] I will give a talk on Automatic Warp Specialization for Modern GPUs at Meta AI Compiler Team. Thanks Jie for inviting me!
  • [10/28/25] [Talk] I will give a talk on Magellan: Autonomous Discovery of Novel Compiler Optimization Heuristics with AlphaEvolve at the LLVM Developers’ Meeting on October 28.
  • [10/17/25] [Paper] Our HeuriGym paper on LLM benchmark for combinatorial optimization problems has been accepted to the Math-AI workshop @ NeurIPS’25! Congrats to all the coauthors!
  • [10/8/25] [Talk] I gave a talk on HeuriGym at Cathy Wu’s group in MIT. Thanks the invitation!
  • [9/3/25] [Service] Served as a reviewer of ICLR’26.
  • [8/22/25] [Internship] I finished my final internship presentation at Google! Thanks for all the support throughout the summer!
  • [8/8/25] [Award] I received the Travel Grant from the LLVM Foundation.
  • [8/1/25] [Talk] Our talk proposal on LLM for compilers has been accepted to LLVM Developers’ Meeting in October.
  • [8/1/25] [Service] Served as a reviewer of AAAI’26.
  • [7/14/25] [Paper] Our paper on processing in memory has been accepted to MICRO’25! Congrats to all the coauthors!
  • [5/5/25] [Talk] I will visit University of Edinburgh and give a talk on Allo on May 19. Thanks Jianyi for inviting me!
  • [4/7/25] [Talk] I will give a talk on Allo at the Jane Street Xcelerate Colloquium (JSXC) on May 16 in New York City.
  • [3/6/25] [Internship] I will join Google as a student researcher working on LLM for compilers in Sunnyvale, CA during this summer. See you in California!
  • [2/28/25] [Award] Our AREIS paper has been nominated as one of the Best Paper Candidates at FPGA’25. Congrats to the team!
  • [2/25/25] [Talk] Our presentation proposal on Allo has been accepted for the 2nd FPGA Developers’ Forum (FDF) at CERN.
  • [2/24/25] [Award] Niansong and I have been selected as Finalists for the 2025 Qualcomm Innovation Fellowship!
  • [2/17/25] [Service] Served as a reviewer of NeurIPS’25.
  • [1/26/25] [Talk] Niansong and I will give a talk on Allo at Mengjia’s Lab @ MIT. Thanks for the invitation!
  • [1/7/25] [Talk] I will give a guest lecture on Allo in ECE 8893 – Parallel Programming for FPGAs at Georgia Tech. Many thanks to Callie for the kind invitation!
  • [12/31/24] [Talk] Our presentation proposal on Allo has been accepted to C4ML’25@CGO. Let’s meet in Las Vegas!
  • [12/24/24] [Tutorial] Our tutorial proposal on Allo has been accepted to FPGA’25. We warmly invite you to join us in Monterey in late February to discuss and collaborate!
  • [12/12/24] [Service] Served as a reviewer of ICML’25.
  • [11/29/24] [Paper] Our paper on AIE programming has been accepted to FPGA’25! Congrats to all the coauthors!
  • [11/20/24] [Service] Served as an external reviewer of MLSys’25 and joined the OOPSLA’25 Artifact Evaluation Committee.
  • [10/16/24] [Talk] I passed the Examination for Admission to Candidacy (A Exam) and became a PhD candidate! Thanks for all the support!
  • [10/01/24] [Talk] Niansong and I will attend the annual review of the SRC JUMP 2.0 ACE Center in Chicago from Oct 1 to Oct 3 and give a presentation on Allo. See you there!
  • [08/24/24] [Service] Served as a reviewer of ICLR’25.
  • [08/22/24] [Talk] I gave a final presentation for my internship project on Automatic Warp Specialization for Hopper Architecture at NVIDIA. I will continue working on it as a part-time intern until November.
  • [07/01/24] [Talk] I will be attending the 2024 MLSys Rising Star workshop at the NVIDIA Headquarter in Santa Clara, CA from July 15 to July 16. See you in the Bay Area!
  • [06/27/24] [Award] I received 3rd place in the ACM SIGPLAN PLDI Student Research Competition (SRC).
  • [06/10/24] [Talk] I will give a talk on Slapo for distributed model training at ByteDance on Jun 14. Thanks Youjie for inviting me!
  • [05/16/24] [Award] I am selected as one of the ML and Systems Rising Stars! Thanks for all the support!
  • [05/11/24] [Talk] Received the PLDI’24 Travel Grant. I will present our work on Allo in Copenhagen, Denmark at the end of June. Please come to find me if you are around!
  • [05/10/24] [Talk] I will give a talk on LLM acceleration with Allo at UW SAMPL group on May 31. Thanks Keisuke for inviting me!
  • [04/30/24] [Paper] My paper on schedule reconstruction has been accepted by PLDI’24 Student Research Competition (SRC)! I’ll present the poster at the PLDI conference in June.
  • [04/28/24] [Service] Joined OSDI’24/ATC’24 Artifact Evaluation Committee.
  • [04/08/24] [Talk] I will give a talk at the UIUC AMD-Xilinx Center of Excellence (HACC) seminar on Apr 10. Thanks Deming for inviting me!
  • [04/05/24] [Talk] Received the FCCM’24 Travel Grant. We will demonstrate Allo at the Demo Night. See you in Orlando at the beginning of May!
  • [03/31/24] [Paper] Our Allo paper has been fully accepted to PLDI’24! Code is open-source.
  • [03/20/24] [Paper] Our LLM-FPGA paper has been accepted to FCCM’24 journal track and will be published in TRETS!
  • [03/14/24] [Travel] Received the ASPLOS’24 Travel Grant. See you in San Diego at the end of April!
  • [03/05/24] [Award] Our HLS verification paper has received the Best Paper Award at FPGA’24!
  • [02/27/24] [Paper] Our Allo paper has been conditionally accepted to PLDI’24! Code is open-source.
  • [02/21/24] [Travel] I will be attending FPGA’24 from Mar 2-6 in Monterey, CA. Feel free to reach out if you want to chat!
  • [01/15/24] [Service] Joined PLDI’24 Artifact Evaluation Committee.
  • [12/27/23] [Internship] Received the internship offer from NVIDIA! I’ll join the NVIDIA deep learning compiler team in 2024 Summer.
  • [12/10/23] [Paper] Our HLS verification paper has been accepted to FPGA’24. Congrats to all the coauthors!
  • [11/07/23] [Paper] Our Slapo paper has been accepted to ASPLOS’24! Code is open-source.
  • [10/30/23] [Service] Joined OOPSLA’24 Artifact Evaluation Committee.
  • [09/12/23] [Talk] Attended SRC TECHCON at Austin and gave a talk on decoupled model schedule.

Publications

* indicates equal contribution

2026

HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial Optimization
Hongzheng Chen*, Yingheng Wang*, Yaohui Cai*, Hins Hu*, Jiajie Li*, Shirley Huang, Chenhui Deng, Rongjian Liang, Shufeng Kong, Haoxing Ren, Samitha Samaranayake, Carla P. Gomes, Zhiru Zhang
ICLRInternational Conference on Learning Representations, 2026
Abstract BibTeX Code Blog

Tawa: Automatic Warp Specialization for Modern GPUs with Asynchronous References
Hongzheng Chen, Bin Fan, Alexander Collins, Bastian Hagedorn, Evghenii Gaburov, Masahiro Masuda, Matthew Brookhart, Chris Sullivan, Jason Knight, Zhiru Zhang, Vinod Grover
CGOIEEE/ACM International Symposium on Code Generation and Optimization, 2026
Abstract BibTeX Code Blog

2025

Characterizing and Optimizing Realistic Workloads on a Commercial Compute-in-SRAM Device
Niansong Zhang, Wenbo Zhu, Courtney Golden, Dan Ilan, Hongzheng Chen, Christopher Batten, Zhiru Zhang
MICROThe International Symposium on Microarchitecture, 2025
Abstract BibTeX Artifact News

🎗️ ARIES: An Agile MLIR-Based Compilation Flow for Reconfigurable Devices with AI Engines
Jinming Zhuang*, Shaojie Xiang*, Hongzheng Chen, Niansong Zhang, Zhuoping Yang, Tony Mao, Zhiru Zhang, Peipei Zhou
FPGAACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2025 (Best Paper Nominee)
Abstract BibTeX Code

2024

Allo: A Programming Model for Composable Accelerator Design
Hongzheng Chen*, Niansong Zhang*, Shaojie Xiang, Zhichen Zeng, Mengjia Dai, Zhiru Zhang
PLDIACM SIGPLAN Conference on Programming Language Design and Implementation, 2024
Abstract BibTeX Code Artifact Blog

Understanding the Potential of FPGA-Based Spatial Acceleration for Large Language Model Inference
Hongzheng Chen, Jiahao Zhang, Yixiao Du, Shaojie Xiang, Zichao Yue, Niansong Zhang, Yaohui Cai, Zhiru Zhang
ACM TRETSACM Transactions on Reconfigurable Technology and Systems, 2024 (FCCMIEEE International Symposium on Field-Programmable Custom Computing Machines‘24 Journal Track)
Abstract BibTeX Code Blog

Slapo: A Schedule Language for Progressive Optimization of Large Deep Learning Model Training
Hongzheng Chen, Cody Hao Yu, Shuai Zheng, Zhen Zhang, Zhiru Zhang, Yida Wang
ASPLOSACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024
Abstract BibTeX Code Artifact News

🏆 Formal Verification of Source-to-Source Transformations for HLS
Louis-Noël Pouchet, Emily Tucker, Niansong Zhang, Hongzheng Chen, Debjit Pal, Gabriel Rodríguez, Zhiru Zhang
FPGAACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2024 (Best Paper Award)
Abstract BibTeX Code

2023

BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing
Tianfeng Liu*, Yangrui Chen*, Dan Li, Chuan Wu, Yibo Zhu, Jun He, Yanghua Peng, Hongzheng Chen, Hongzhi Chen, Chuanxiong Guo
NSDIUSENIX Symposium on Networked Systems Design and Implementation, 2023
Abstract BibTeX Code

2022

Accelerator Design with Decoupled Hardware Customizations: Benefits and Challenges
Debjit Pal, Yi-Hsiang Lai, Shaojie Xiang, Niansong Zhang, Hongzheng Chen, Jeremy Casas, Pasquale Cocchini, Zhenkun Yang, Jin Yang, Louis-Noël Pouchet, Zhiru Zhang
DACACM/IEEE Design Automation Conference, 2022 (Invited Paper)
Abstract BibTeX

HeteroFlow: An Accelerator Programming Model with Decoupled Data Placement for Software-Defined FPGAs
Shaojie Xiang, Yi-Hsiang Lai, Yuan Zhou, Hongzheng Chen, Niansong Zhang, Debjit Pal, Zhiru Zhang
FPGAACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2022
Abstract BibTeX Code

2021

Krill: A Compiler and Runtime System for Concurrent Graph Processing
Hongzheng Chen, Minghua Shen, Nong Xiao, Yutong Lu
SCInternational Conference for High Performance Computing, Networking, Storage and Analysis, 2021
Abstract BibTeX Code Artifact

🎗️ FracBNN: Accurate and FPGA-Efficient Binary Neural Networks with Fractional Activations
Yichi Zhang, Junhao Pan, Xinheng Liu, Hongzheng Chen, Deming Chen, Zhiru Zhang
FPGAACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2021 (Best Paper Nominee)
Abstract BibTeX Code

2020

Entropy-Directed Scheduling for FPGA High-Level Synthesis
Minghua Shen, Hongzheng Chen*, Nong Xiao
IEEE TCADIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2020
Abstract BibTeX Code

2019

A Deep-Reinforcement-Learning-Based Scheduler for FPGA HLS
Hongzheng Chen, Minghua Shen
ICCADIEEE/ACM International Conference on Computer-Aided Design, 2019
Abstract BibTeX Code

Workshops / Preprints

Magellan: Autonomous Discovery of Novel Compiler Optimization Heuristics with AlphaEvolve
Hongzheng Chen, Alexander Novikov, Ngân (NV) Vũ, Hanna Alam, Zhiru Zhang, Aiden Grossman, Mircea Trofin, Amir Yazdanbakhsh
C4ML@CGOCompilers for Machine Learning Workshop at International Symposium on Code Generation and Optimization, 2026
Abstract BibTeX Blog News

Dato: A Task-Based Programming Model for Dataflow Accelerators
Shihan Fang*, Hongzheng Chen*, Niansong Zhang, Jiajie Li, Han Meng, Adrian Liu, Zhiru Zhang
arXiv:2509.06794, 2025
Abstract BibTeX Code Blog

Allo: Catalyzing Accelerator Design and Programming for Machine Learning
Hongzheng Chen, Niansong Zhang, Shaojie Xiang, Zhiru Zhang
C4ML@CGOCompilers for Machine Learning Workshop at International Symposium on Code Generation and Optimization, 2025
Abstract BibTeX Code

🥉 Uncovering Magic with Magic: Schedule Reconstruction from High-Performance Kernel Libraries
Hongzheng Chen
PLDI Student Research Competition (SRC)ACM SIGPLAN Conference on Programming Language Design and Implementation Student Research Competition, 2024 (Bronze)
Abstract BibTeX Code

Structured Pruning is All You Need for Pruning CNNs at Initialization
Yaohui Cai, Weizhe Hua, Hongzheng Chen, G. Edward Suh, Christopher De Sa, Zhiru Zhang
arXiv:2203.02549, 2022
Abstract BibTeX

Professional Service

Journal Reviewer TC TCAD TACO TRETS TODAES JETCAS JSC
Workshop / Tutorial Organizer RAIC @ ASPLOS'26 Allo Tutorial @ FPGA'25

Awards & Honors

2021Outstanding Undergraduate Thesis Award · Sun Yat-sen University
2020CCF Elite Collegiate Award · China Computer Federation (CCF) · 98 undergrads in China
2019IEEE EDAthon 2nd Place · CEDA HK

Scholarship

2020SenseTime Scholarship · SenseTime · 21 undergrads in China
2018–20Chinese National Scholarship $\times$ 2 · Ministry of Education of PRC · Top 1%
2017–20First-Prize Scholarship $\times$ 3 · Sun Yat-sen University · Top 5%
2017–18Samsung Scholarship · Samsung Electronics · Top 1%

Talks

Composable Programming for AI Scaling 2 talks
Automatic Warp Specialization for Modern GPUs with Asynchronous References 2 talks
  • Meta AI Compiler Team, Menlo Park, CA, Oct 29, 2025
  • NVIDIA DL Compiler, Redmond, WA, Aug 21, 2024
Magellan: Autonomous Discovery of Novel Compiler Optimization Heuristics with AlphaEvolve 4 talks
HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial Optimization 1 talk
Dato: A Task-Based Programming Model for Dataflow Accelerators 2 talks
Allo: A Programming Model for Composable Accelerator Design 16 talks
Accelerating Large Language Model Inference on FPGA with Allo 2 talks
Understanding the Potential of FPGA-Based Spatial Acceleration for Large Language Model Inference 1 talk
  • FCCM'24, Orlando, FL, May 7, 2024
Slapo: A Schedule Language for Progressive Optimization of Large Deep Learning Model Training 10 talks
An MLIR-Based Intermediate Representation for Accelerator Design with Decoupled Hardware Customizations 2 talks
Krill: A Compiler and Runtime System for Concurrent Graph Processing 1 talk
  • SC'21, St. Louis, MO, Nov 17, 2021
A Deep-Reinforcement-Learning-Based Scheduler for FPGA HLS 1 talk