Publications
* denotes equal contribution
An up-to-date list is available on Google Scholar
2022
-
GPT-NeoX-20B: An Open-Source Autoregressive Language ModelIn Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models May 2022
-
Accelerating Broadcast Communication with GPU Compression for Deep Learning WorkloadsIn 2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC) May 2022
-
Highly Efficient Alltoall and Alltoallv Communication Algorithms for GPU SystemsIn 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) May 2022
2021
-
Scaling Single-Image Super-Resolution Training on Modern HPC Clusters: Early ExperiencesIn 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) May 2021
-
Adaptive and Hierarchical Large Message All-to-all Communication Algorithms for Large-scale Dense GPU SystemsIn 2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid) May 2021
-
Cross-layer Visualization and Profiling of Network and I/O Communication for HPC ClustersarXiv preprint arXiv:2109.08329 May 2021
-
Evaluating Multi-Level Checkpointing for Distributed Deep Neural Network TrainingIn SC Workshops Supplementary Proceedings (SCWS) May 2021
2020
-
HyPar-Flow: exploiting MPI and Keras for scalable hybrid-parallel DNN training with tensorflowIn International Conference on High Performance Computing May 2020
-
Efficient training of semantic image segmentation on summit using horovod and mvapich2-gdrIn 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) May 2020
-
Gems: Gpu-enabled memory-aware model-parallelism system for distributed dnn trainingIn SC20: International Conference for High Performance Computing, Networking, Storage and Analysis May 2020
-
Accelerating GPU-based Machine Learning in Python using MPI Library: A Case Study with MVAPICH2-GDRIn 2020 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC) and Workshop on Artificial Intelligence and Machine Learning for Scientific Applications (AI4S) May 2020
2019
-
Performance characterization of dnn training using tensorflow and pytorch on modern clustersIn 2019 IEEE International Conference on Cluster Computing (CLUSTER) May 2019