2025
|
Matcha: A Language and Compiler for Backtracking-based Subgraph Matching.
Yihua Wei, Lihan Hu, and peng jiang.
IEEE International Parallel and Distributed Processing
Symposium (IPDPS).
|
2025
|
A Memory-efficient and Computation-balanced Lossy Compressor on Wafer-scale
Engine.
Shihui Song, Robert Underwood, Sheng Di, Yafan Huang, Peng Jiang, and
Franck Cappello.
IEEE International Parallel and Distributed Processing
Symposium (IPDPS).
|
2025
|
Improving Accuracy and Efficiency of Graph Embedding Training with Fine-grained
Parameter Management.
Lihan Hu and Peng Jiang.
IEEE International Parallel and Distributed Processing
Symposium (IPDPS).
|
2024
|
GCSM: GPU-Accelerated Continuous Subgraph
Matching for Large Graphs.
Yihua Wei and Peng Jiang.
IEEE International Parallel and Distributed Processing
Symposium (IPDPS).
|
2024
|
cuKE: An Efficient Code Generator for Score
Function Computation in Knowledge Graph Embedding.
Lihan Hu, Jing Li, and Peng Jiang.
IEEE International Parallel and Distributed Processing
Symposium (IPDPS).
|
2024
|
CereSZ: Enabling and Scaling Error-bounded Lossy
Compression on Cerebras CS-2.
Shihui Song, Yafan Huang, Peng Jiang, Xiaodong Yu, Weijian Zheng, Sheng Di,
Qinglei Cao, Yunhe Feng, Zhen Xie, and Franck Cappello.
International Symposium on High-Performance Parallel
and Distributed Computing (HPDC).
|
2023
|
PIMMiner: A High-performance PIM Architecture-aware
Graph Mining Framework.
Jiya Su, Peng Jiang, and Rujia Wang.
CoRR.
|
2023
|
End-to-End LU Factorization of Large Matrices on
GPUs.
Yang Xia, Peng Jiang, Gagan Agrawal, and Rajiv Ramnath.
ACM SIGPLAN Annual Symposium on Principles and Practice
of Parallel Programming (PPoPP).
|
2022
|
STMatch: Accelerating Graph Pattern
Matching on GPU with Stack-Based Loop Optimizations.
Yihua Wei and Peng Jiang.
International Conference on High Performance Computing,
Networking, Storage and Analysis (SC).
|
2022
|
SampleMine: A Framework for Applying Random Sampling
to Subgraph Pattern Mining through Loop Perforation.
Peng Jiang, Yihua Wei, Jiya Su, Rujia Wang, and Bo Wu.
International Conference on Parallel Architectures and
Compilation Techniques (PACT).
|
2022
|
Exposing and Exploiting Fine-Grained Block
Structures for Fast and Accurate Sparse Training.
Peng Jiang, Lihan Hu, and Shihui Song.
Advances in Neural Information Processing Systems
(NeurIPS).
|
2022
|
Scaling and Selecting GPU Methods for All Pairs
Shortest Paths Computations.
Yang Xia, Peng Jiang, Gagan Agrawal, and Rajiv Ramnath.
2022 IEEE International Parallel and Distributed
Processing Symposium (IPDPS).
|
2022
|
Rethinking Graph Data Placement for Graph Neural
Network Training on Multiple GPUs.
Shihui Song and Peng Jiang.
International Conference on Supercomputing
(ICS).
|
2021
|
Scaling Sparse Matrix Multiplication on CPU-GPU
Nodes.
Yang Xia, Peng Jiang, Gagan Agrawal, and Rajiv Ramnath.
2021 IEEE International Parallel and Distributed
Processing Symposium (IPDPS).
|
2021
|
Exploring PIM Architecture for High-Performance
Graph Pattern Mining.
Jiya Su, Linfeng He, Peng Jiang, and Rujia Wang.
IEEE Computer Architecture Letters 20(2).
|
2021
|
Communication-Efficient Sampling for Distributed Training of
Graph Convolutional Networks.
Peng Jiang and Masuma Akter Rumi.
CoRR.
|
2020
|
Scaling out speculative execution of finite-state
machines with parallel merge.
Yang Xia, Peng Jiang, and Gagan Agrawal.
25th ACM SIGPLAN Symposium on Principles and Practice
of Parallel Programming (PPoPP).
|
2020
|
A novel data transformation and execution strategy for
accelerating sparse matrix multiplication on GPUs.
Peng Jiang, Changwan Hong, and Gagan Agrawal.
25th ACM SIGPLAN Symposium on Principles and Practice
of Parallel Programming (PPoPP).
|
2020
|
Accelerating Sparse CNN Inference on GPUs with
Performance-Aware Weight Pruning.
Masuma Akter Rumi, Xiaolong Ma, Yanzhi Wang, and Peng Jiang.
International Conference on Parallel Architectures and
Compilation Techniques (PACT).
|
2020
|
Adaptive Periodic Averaging: A Practical Approach to Reducing
Communication in Distributed Learning.
Peng Jiang and Gagan Agrawal.
CoRR.
|
2019
|
A Methodology for Characterizing Sparse Datasets and
Its Application to SIMD Performance Prediction.
Gangyi Zhu, Peng Jiang, and Gagan Agrawal.
28th International Conference on Parallel Architectures
and Compilation Techniques (PACT).
|
2019
|
Enabling prefix sum parallelism pattern for
recurrences with principled function reconstruction.
Yang Xia, Peng Jiang, and Gagan Agrawal.
International Conference on Compiler Construction
(CC).
|
2018
|
Revealing parallel scans and reductions in recurrences
through function reconstruction.
Peng Jiang, Linchuan Chen, and Gagan Agrawal.
International Conference on Parallel Architectures and
Compilation Techniques (PACT).
|
2018
|
Conflict-free vectorization of associative irregular
applications with recent SIMD architectural advances.
Peng Jiang and Gagan Agrawal.
International Symposium on Code Generation and
Optimization (CGO).
|
2018
|
A
Linear Speedup Analysis of Distributed Deep Learning with Sparse and Quantized Communication.
Peng Jiang and Gagan Agrawal.
Advances in Neural Information Processing
Systems.
|
2017
|
Efficient SIMD and MIMD parallelization of hash-based
aggregation by conflict mitigation.
Peng Jiang and Gagan Agrawal.
International Conference on Supercomputing
(ICS).
|
2017
|
Combining SIMD and Many/Multi-core Parallelism for
Finite State Machines with Enumerative Speculation.
Peng Jiang and Gagan Agrawal.
ACM SIGPLAN Symposium on Principles and Practice of
Parallel Programming (PPoPP).
|
2016
|
Exploiting recent SIMD architectural advances for
irregular applications.
Linchuan Chen, Peng Jiang, and Gagan Agrawal.
International Symposium on Code Generation and
Optimization (CGO).
|
2016
|
Reusing Data Reorganization for Efficient SIMD
Parallelization of Adaptive Irregular Applications.
Peng Jiang, Linchuan Chen, and Gagan Agrawal.
International Conference on Supercomputing
(ICS).
|