Publication

My advisees are underlined.

2026	Designing Domain-Specific Compilers for Lossy Compression: A Case Study on Wafer-Scale Engine. Shihui Song, Robert Underwood, Sheng Di, Peng Jiang, and Franck Cappello. IEEE International Parallel and Distributed Processing Symposium (IPDPS).
2026	I/O-Aware PIM Acceleration for Long-Sequence LLM Inference with Hybrid Sparse Attention. Xiaoyang Lu, Lihan Hu, Hongrui Huang, Peng Jiang, and Xian-He Sun. IEEE International Parallel and Distributed Processing Symposium (IPDPS).
2026	DCSM: Enabling Inter-Batch Parallelism for Continuous Subgraph Matching on GPU. Yihua Wei and Peng Jiang. International Conference on Supercomputing (ICS).
2025	Matcha: A Language and Compiler for Backtracking-based Subgraph Matching. Yihua Wei, Lihan Hu, and peng jiang. IEEE International Parallel and Distributed Processing Symposium (IPDPS).
2025	A Memory-efficient and Computation-balanced Lossy Compressor on Wafer-scale Engine. Shihui Song, Robert Underwood, Sheng Di, Yafan Huang, Peng Jiang, and Franck Cappello. IEEE International Parallel and Distributed Processing Symposium (IPDPS).
2025	Improving Accuracy and Efficiency of Graph Embedding Training with Fine-grained Parameter Management. Lihan Hu and Peng Jiang. IEEE International Parallel and Distributed Processing Symposium (IPDPS).
2025	What to Support When You're Compressing: The State of Practice, Gaps, and Opportunities for Scientific Data Compression. Franck Cappello et al. International Conference on High Performance Computing, Networking, Storage and Analysis (SC).
2024	GCSM: GPU-Accelerated Continuous Subgraph Matching for Large Graphs. Yihua Wei and Peng Jiang. IEEE International Parallel and Distributed Processing Symposium (IPDPS).
2024	cuKE: An Efficient Code Generator for Score Function Computation in Knowledge Graph Embedding. Lihan Hu, Jing Li, and Peng Jiang. IEEE International Parallel and Distributed Processing Symposium (IPDPS).
2024	CereSZ: Enabling and Scaling Error-bounded Lossy Compression on Cerebras CS-2. Shihui Song, Yafan Huang, Peng Jiang, Xiaodong Yu, Weijian Zheng, Sheng Di, Qinglei Cao, Yunhe Feng, Zhen Xie, and Franck Cappello. International Symposium on High-Performance Parallel and Distributed Computing (HPDC).
2023	PIMMiner: A High-performance PIM Architecture-aware Graph Mining Framework. Jiya Su, Peng Jiang, and Rujia Wang. CoRR.
2023	End-to-End LU Factorization of Large Matrices on GPUs. Yang Xia, Peng Jiang, Gagan Agrawal, and Rajiv Ramnath. ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP).
2022	STMatch: Accelerating Graph Pattern Matching on GPU with Stack-Based Loop Optimizations. Yihua Wei and Peng Jiang. International Conference on High Performance Computing, Networking, Storage and Analysis (SC).
2022	SampleMine: A Framework for Applying Random Sampling to Subgraph Pattern Mining through Loop Perforation. Peng Jiang, Yihua Wei, Jiya Su, Rujia Wang, and Bo Wu. International Conference on Parallel Architectures and Compilation Techniques (PACT).
2022	Exposing and Exploiting Fine-Grained Block Structures for Fast and Accurate Sparse Training. Peng Jiang, Lihan Hu, and Shihui Song. Advances in Neural Information Processing Systems (NeurIPS).
2022	Scaling and Selecting GPU Methods for All Pairs Shortest Paths Computations. Yang Xia, Peng Jiang, Gagan Agrawal, and Rajiv Ramnath. 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
2022	Rethinking Graph Data Placement for Graph Neural Network Training on Multiple GPUs. Shihui Song and Peng Jiang. International Conference on Supercomputing (ICS).
2021	Scaling Sparse Matrix Multiplication on CPU-GPU Nodes. Yang Xia, Peng Jiang, Gagan Agrawal, and Rajiv Ramnath. 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
2021	Exploring PIM Architecture for High-Performance Graph Pattern Mining. Jiya Su, Linfeng He, Peng Jiang, and Rujia Wang. IEEE Computer Architecture Letters 20(2).
2021	Communication-Efficient Sampling for Distributed Training of Graph Convolutional Networks. Peng Jiang and Masuma Akter Rumi. CoRR.
2020	Scaling out speculative execution of finite-state machines with parallel merge. Yang Xia, Peng Jiang, and Gagan Agrawal. 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP).
2020	A novel data transformation and execution strategy for accelerating sparse matrix multiplication on GPUs. Peng Jiang, Changwan Hong, and Gagan Agrawal. 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP).
2020	Accelerating Sparse CNN Inference on GPUs with Performance-Aware Weight Pruning. Masuma Akter Rumi, Xiaolong Ma, Yanzhi Wang, and Peng Jiang. International Conference on Parallel Architectures and Compilation Techniques (PACT).
2020	Adaptive Periodic Averaging: A Practical Approach to Reducing Communication in Distributed Learning. Peng Jiang and Gagan Agrawal. CoRR.
2019	A Methodology for Characterizing Sparse Datasets and Its Application to SIMD Performance Prediction. Gangyi Zhu, Peng Jiang, and Gagan Agrawal. 28th International Conference on Parallel Architectures and Compilation Techniques (PACT).
2019	Enabling prefix sum parallelism pattern for recurrences with principled function reconstruction. Yang Xia, Peng Jiang, and Gagan Agrawal. International Conference on Compiler Construction (CC).
2018	Revealing parallel scans and reductions in recurrences through function reconstruction. Peng Jiang, Linchuan Chen, and Gagan Agrawal. International Conference on Parallel Architectures and Compilation Techniques (PACT).
2018	Conflict-free vectorization of associative irregular applications with recent SIMD architectural advances. Peng Jiang and Gagan Agrawal. International Symposium on Code Generation and Optimization (CGO).
2018	A Linear Speedup Analysis of Distributed Deep Learning with Sparse and Quantized Communication. Peng Jiang and Gagan Agrawal. Advances in Neural Information Processing Systems.
2017	Efficient SIMD and MIMD parallelization of hash-based aggregation by conflict mitigation. Peng Jiang and Gagan Agrawal. International Conference on Supercomputing (ICS).
2017	Combining SIMD and Many/Multi-core Parallelism for Finite State Machines with Enumerative Speculation. Peng Jiang and Gagan Agrawal. ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP).
2016	Exploiting recent SIMD architectural advances for irregular applications. Linchuan Chen, Peng Jiang, and Gagan Agrawal. International Symposium on Code Generation and Optimization (CGO).
2016	Reusing Data Reorganization for Efficient SIMD Parallelization of Adaptive Irregular Applications. Peng Jiang, Linchuan Chen, and Gagan Agrawal. International Conference on Supercomputing (ICS).