I am a Ph.D. candidate in the Computer Systems Laboratory at Cornell University, advised by Prof. Zhiru Zhang. I received my B.E. in Computer Science with highest honors from Sun Yat-sen University.

I have been fortunate to intern at Google, NVIDIA, AWS, and ByteDance, where I worked on research in machine learning systems. I was also selected as a 2024 ML and Systems Rising Star.

Research Highlights

My research focuses on building compilers, programming systems, and accelerators for large-scale machine learning workloads, with an emphasis on large language models (LLMs). In particular, I attempt to bridge the productivity-performance gap between emerging machine learning applications (e.g., generative AI) and heterogeneous accelerators.

indicates projects where I am the project lead

Accelerator Programming Frameworks:

  • AutoWS introduces automatic warp specialization to discover efficient LLM kernels such as FlashAttention on NVIDIA Hopper and Blackwell GPUs. The proposed NVWS dialect has been merged upstream into OpenAI Triton.
  • Slapo [ASPLOS’24] is a distributed LLM training framework deployed at AWS, designed to balance usability and performance. It has influenced the design of ByteDance’s veScale and Meta’s TorchTitan.
  • BGL [NSDI’23] is a production-scale GNN training framework used at ByteDance, reducing billion-node graph training time from weeks to days.

Accelerator Design Languages: Allo [PLDI’24] is a Python/MLIR-based programming language for efficient ML accelerator design. It is adopted by 10+ universities and companiesCornell, UCLA, UIUC, Brown, UofT, UVA, UofChiago, UIC, Imperial, Tsinghua, SJTU, Intel, AMD, Microsoft. Allo integrates multiple supporting tools, including the PEQC [FPGA’24, Best Paper Award] equivalence checker, the HeteroFlow [FPGA’22] dataflow programming framework, and the ARIES [FPGA’25, Best Paper Nominee] backend for AMD AI Engine.

Accelerator Architectures for ML:

ML for Systems:

News

  • [5/5/25] [Talk] I will visit University of Edinburgh and give a talk on Allo on May 19. Thanks Jianyi for inviting me!
  • [4/7/25] [Talk] I will give a talk on Allo at the Jane Street Xcelerate Colloquium (JSXC) on May 16 in New York City.
  • [3/6/25] [Internship] I will join Google as a student researcher working on LLM for compilers in Sunnyvale, CA during this summer. See you in California!
  • [2/28/25] [Award] Our AREIS paper has been nominated as one of the Best Paper Candidates at FPGA’25. Congrats to the team!
  • [2/25/25] [Talk] Our presentation proposal on Allo has been accepted for the 2nd FPGA Developers’ Forum (FDF) at CERN.
  • [2/24/25] [Award] Niansong and I have been selected as Finalists for the 2025 Qualcomm Innovation Fellowship!
  • [2/17/25] [Service] Served as a reviewer of NeurIPS’25.
  • [1/26/25] [Talk] Niansong and I will give a talk on Allo at Mengjia’s Lab @ MIT. Thanks for the invitation!
  • [1/7/25] [Talk] I will give a guest lecture on Allo in ECE 8893 – Parallel Programming for FPGAs at Georgia Tech. Many thanks to Callie for the kind invitation!
  • [12/31/24] [Talk] Our presentation proposal on Allo has been accepted to C4ML’25@CGO. Let’s meet in Las Vegas!
  • [12/24/24] [Tutorial] Our tutorial proposal on Allo has been accepted to FPGA’25. We warmly invite you to join us in Monterey in late February to discuss and collaborate!
  • [12/12/24] [Service] Served as a reviewer of ICML’25.
  • [11/29/24] [Paper] Our paper on AIE programming has been accepted to FPGA’25! Congrats to all the coauthors!
  • [11/20/24] [Service] Served as an external reviewer of MLSys’25 and joined the OOPSLA’25 Artifact Evaluation Committee.
  • [10/16/24] [Talk] I passed the Examination for Admission to Candidacy (A Exam) and became a PhD candidate! Thanks for all the support!
  • [10/01/24] [Talk] Niansong and I will attend the annual review of the SRC JUMP 2.0 ACE Center in Chicago from Oct 1 to Oct 3 and give a presentation on Allo. See you there!
  • [08/24/24] [Service] Served as a reviewer of ICLR’25.
  • [08/22/24] [Talk] I gave a final presentation for my internship project on Automatic Warp Specialization for Hopper Architecture at NVIDIA. I will continue working on it as a part-time intern until November.
  • [07/01/24] [Talk] I will be attending the 2024 MLSys Rising Star workshop at the NVIDIA Headquarter in Santa Clara, CA from July 15 to July 16. See you in the Bay Area!
  • [06/27/24] [Award] I received 3rd place in the ACM SIGPLAN PLDI Student Research Competition (SRC).
  • [06/10/24] [Talk] I will give a talk on Slapo for distributed model training at ByteDance on Jun 14. Thanks Youjie for inviting me!
  • [05/16/24] [Award] I am selected as one of the ML and Systems Rising Stars! Thanks for all the support!
  • [05/11/24] [Talk] Received the PLDI’24 Travel Grant. I will present our work on Allo in Copenhagen, Denmark at the end of June. Please come to find me if you are around!
  • [05/10/24] [Talk] I will give a talk on LLM acceleration with Allo at UW SAMPL group on May 31. Thanks Keisuke for inviting me!
  • [04/30/24] [Paper] My paper on schedule reconstruction has been accepted by PLDI’24 Student Research Competition (SRC)! I’ll present the poster at the PLDI conference in June.
  • [04/28/24] [Service] Joined OSDI’24/ATC’24 Artifact Evaluation Committee.
  • [04/08/24] [Talk] I will give a talk at the UIUC AMD-Xilinx Center of Excellence (HACC) seminar on Apr 10. Thanks Deming for inviting me!
  • [04/05/24] [Talk] Received the FCCM’24 Travel Grant. We will demonstrate Allo at the Demo Night. See you in Orlando at the beginning of May!
  • [03/31/24] [Paper] Our Allo paper has been fully accepted to PLDI’24! Code is open-source.
  • [03/20/24] [Paper] Our LLM-FPGA paper has been accepted to FCCM’24 journal track and will be published in TRETS!
  • [03/14/24] [Travel] Received the ASPLOS’24 Travel Grant. See you in San Diego at the end of April!
  • [03/05/24] [Award] Our HLS verification paper has received the Best Paper Award at FPGA’24!
  • [02/27/24] [Paper] Our Allo paper has been conditionally accepted to PLDI’24! Code is open-source.
  • [02/21/24] [Travel] I will be attending FPGA’24 from Mar 2-6 in Monterey, CA. Feel free to reach out if you want to chat!
  • [01/15/24] [Service] Joined PLDI’24 Artifact Evaluation Committee.
  • [12/27/23] [Internship] Received the internship offer from NVIDIA! I’ll join the NVIDIA deep learning compiler team in 2024 Summer.
  • [12/10/23] [Paper] Our HLS verification paper has been accepted to FPGA’24. Congrats to all the coauthors!
  • [11/07/23] [Paper] Our Slapo paper has been accepted to ASPLOS’24! Code is open-source.
  • [10/30/23] [Service] Joined OOPSLA’24 Artifact Evaluation Committee.
  • [09/12/23] [Talk] Attended SRC TECHCON at Austin and gave a talk on decoupled model schedule.

Publications

ARIES: An Agile MLIR-Based Compilation Flow for Reconfigurable Devices with AI Engines
Jinming Zhuang*, Shaojie Xiang*, Hongzheng Chen, Niansong Zhang, Zhuoping Yang, Tony Mao, Zhiru Zhang, Peipei Zhou
FPGAACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2025 (Best Paper Nominee) | [abs] | [bib] |

Allo: A Programming Model for Composable Accelerator Design
Hongzheng Chen*, Niansong Zhang*, Shaojie Xiang, Zhichen Zeng, Mengjia Dai, Zhiru Zhang
PLDIACM SIGPLAN Conference on Programming Language Design and Implementation, 2024 | [abs] | [bib] | | | Blog (Zhihu)

Understanding the Potential of FPGA-Based Spatial Acceleration for Large Language Model Inference
Hongzheng Chen, Jiahao Zhang, Yixiao Du, Shaojie Xiang, Zichao Yue, Niansong Zhang, Yaohui Cai, Zhiru Zhang
ACM TRETSACM Transactions on Reconfigurable Technology and Systems, 2024 (FCCMIEEE International Symposium on Field-Programmable Custom Computing Machines‘24 Journal Track) | [abs] | [bib] | | Blog (Zhihu)

Slapo: A Schedule Language for Progressive Optimization of Large Deep Learning Model Training
Hongzheng Chen, Cody Hao Yu, Shuai Zheng, Zhen Zhang, Zhiru Zhang, Yida Wang
ASPLOSACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024 | [abs] | [bib] | | | Amazon Science

Formal Verification of Source-to-Source Transformations for HLS
Louis-Noël Pouchet, Emily Tucker, Niansong Zhang, Hongzheng Chen, Debjit Pal, Gabriel Rodríguez, Zhiru Zhang
FPGAACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2024 (Best Paper Award) | [abs] | [bib] |

BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing
Tianfeng Liu*, Yangrui Chen*, Dan Li, Chuan Wu, Yibo Zhu, Jun He, Yanghua Peng, Hongzheng Chen, Hongzhi Chen, Chuanxiong Guo
NSDIUSENIX Symposium on Networked Systems Design and Implementation, 2023 | [abs] | [bib] |

Accelerator Design with Decoupled Hardware Customizations: Benefits and Challenges
Debjit Pal, Yi-Hsiang Lai, Shaojie Xiang, Niansong Zhang, Hongzheng Chen, Jeremy Casas, Pasquale Cocchini, Zhenkun Yang, Jin Yang, Louis-Noël Pouchet, Zhiru Zhang
DACACM/IEEE Design Automation Conference, 2022 (Invited Paper) | [abs] | [bib]

HeteroFlow: An Accelerator Programming Model with Decoupled Data Placement for Software-Defined FPGAs
Shaojie Xiang, Yi-Hsiang Lai, Yuan Zhou, Hongzheng Chen, Niansong Zhang, Debjit Pal, Zhiru Zhang
FPGAACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2022 | [abs] | [bib] |

Krill: A Compiler and Runtime System for Concurrent Graph Processing
Hongzheng Chen, Minghua Shen, Nong Xiao, Yutong Lu
SCInternational Conference for High Performance Computing, Networking, Storage and Analysis, 2021 | [abs] | [bib] | |

FracBNN: Accurate and FPGA-Efficient Binary Neural Networks with Fractional Activations
Yichi Zhang, Junhao Pan, Xinheng Liu, Hongzheng Chen, Deming Chen, Zhiru Zhang
FPGAACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2021 (Best Paper Nominee) | [abs] | [bib] |

Entropy-Directed Scheduling for FPGA High-Level Synthesis
Minghua Shen, Hongzheng Chen*, Nong Xiao
IEEE TCADIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2020 | [abs] | [bib] |

A Deep-Reinforcement-Learning-Based Scheduler for FPGA HLS
Hongzheng Chen, Minghua Shen
ICCADIEEE/ACM International Conference on Computer-Aided Design, 2019 | [abs] | [bib] |

Workshops / Preprints

HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial Optimization
Hongzheng Chen*, Yingheng Wang*, Yaohui Cai*, Hins Hu*, Jiajie Li*, Shirley Huang, Chenhui Deng, Rongjian Liang, Shufeng Kong, Haoxing Ren, Samitha Samaranayake, Carla P. Gomes, Zhiru Zhang
arxiv, 2025 | [abs] | [bib] |

Allo: Catalyzing Accelerator Design and Programming for Machine Learning
Hongzheng Chen, Niansong Zhang, Shaojie Xiang, Zhiru Zhang
C4ML@CGOCompilers for Machine Learning Workshop at International Symposium on Code Generation and Optimization, 2025 | [abs] | [bib] |

Uncovering Magic with Magic: Schedule Reconstruction from High-Performance Kernel Libraries
Hongzheng Chen
PLDI Student Research Competition (SRC)ACM SIGPLAN Conference on Programming Language Design and Implementation Student Research Competition, 2024 (Bronze) | [abs] | [bib] |

Structured Pruning is All You Need for Pruning CNNs at Initialization
Yaohui Cai, Weizhe Hua, Hongzheng Chen, G. Edward Suh, Christopher De Sa, Zhiru Zhang
arXiv:2203.02549, 2022 | [abs] | [bib]

Education

Cornell University, US
Ph.D. in Computer Science
Aug. 2021 - Present
Thesis: Composable Programming Models for Accelerated Computing
Committee: Zhiru Zhang, Adrian Sampson, Mohamed Abdelfattah
Accumulated GPA: 4.0/4.0
Cornell University, US
M.S. in Computer Science
Aug. 2021 - Dec. 2024
Sun Yat-sen University, China
B.E. in Computer Science
Aug. 2017 - Jun. 2021
Overall GPA: 3.95/4.00 (Major GPA: 3.99/4.00)
Ranking: 1/188

Work Experience

Google , Sunnyvale, CA, US
Student Researcher, Compiler Team
Mentors: Mircea Trofin and Amir Yazdanbakhsh
May 2025 - Aug. 2025
NVIDIA , Redmond, WA, US
Machine Learning Compiler Research Intern, Deep Learning Compiler Technology Team
Mentors: Bin Fan and Vinod Grover
May 2024 - Nov. 2024
Amazon Web Services (AWS) , Santa Clara, CA, US
Applied Scientist Intern, Deep Engine-Science Team
Mentors: Cody Hao Yu, Shuai Zheng, and Yida Wang
Aug. 2022 - Apr. 2023
ByteDance AI Lab , Beijing, China
Research Intern, MLSys Team, Applied Machine Learning (AML)
Mentors: Jun He and Yibo Zhu
Aug. 2020 - May 2021

Teaching

Professional Service

Awards & Honors

Scholarship

  • Qualcomm Innovation Fellowship Finalist, Qualcomm, 2025
  • SenseTime Scholarship (21 undergrads in China), SenseTime, 2020
  • Chinese National Scholarship $\times$ 2 (Top 1%), Ministry of Education of PRC, 2018-2020
  • First-Prize Scholarship $\times$ 3 (Top 5%), Sun Yat-sen University, 2017-2020
  • Samsung Scholarship (Top 1%), Samsung Electronics, 2017-2018

Talks