Publications

* denotes equal contribution

An up-to-date list is available on Google Scholar

2022

  1. GPT-NeoX-20B: An Open-Source Autoregressive Language Model
    Black, Sidney, Biderman, Stella, Hallahan, Eric,  Anthony, Quentin, Gao, Leo, Golding, Laurence, He, Horace, Leahy, Connor, McDonell, Kyle, Phang, Jason, Pieler, Michael, Prashanth, Usvsn Sai, Purohit, Shivanshu, Reynolds, Laria, Tow, Jonathan, Wang, Ben, and Weinbach, Samuel
    In Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models May 2022
  2. Accelerating Broadcast Communication with GPU Compression for Deep Learning Workloads
    Zhou, Qinghua,  Anthony, Quentin, Shafi, Aamir, Subramoni, Hari, and Panda, Dhabaleswar K. DK
    In 2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC) May 2022
  3. Highly Efficient Alltoall and Alltoallv Communication Algorithms for GPU Systems
    Chen, Chen-Chun, Khorassani, Kawthar Shafie, Anthony, Quentin G., Shafi, Aamir, Subramoni, Hari, and Panda, Dhabaleswar K.
    In 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) May 2022

2021

  1. Scaling Single-Image Super-Resolution Training on Modern HPC Clusters: Early Experiences
    Anthony, Quentin, Xu, Lang, Subramoni, Hari, and Panda, Dhabaleswar K DK
    In 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) May 2021
  2. Adaptive and Hierarchical Large Message All-to-all Communication Algorithms for Large-scale Dense GPU Systems
    Khorassani, Kawthar Shafie, Chu, Ching-Hsiang, Anthony, Quentin G, Subramoni, Hari, and Panda, Dhabaleswar K
    In 2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid) May 2021
  3. Cross-layer Visualization and Profiling of Network and I/O Communication for HPC Clusters
    Kousha, Pouya,  Anthony, Quentin, Subramoni, Hari, and Panda, Dhabaleswar K
    arXiv preprint arXiv:2109.08329 May 2021
  4. Evaluating Multi-Level Checkpointing for Distributed Deep Neural Network Training
    Anthony, Quentin, and Dai, Donglai
    In SC Workshops Supplementary Proceedings (SCWS) May 2021

2020

  1. HyPar-Flow: exploiting MPI and Keras for scalable hybrid-parallel DNN training with tensorflow
    Awan, Ammar Ahmad, Jain, Arpan,  Anthony, Quentin, Subramoni, Hari, and Panda, Dhabaleswar K
    In International Conference on High Performance Computing May 2020
  2. Efficient training of semantic image segmentation on summit using horovod and mvapich2-gdr
    Anthony, Quentin, Awan, Ammar Ahmad, Jain, Arpan, Subramoni, Hari, and Panda, Dhabaleswar K DK
    In 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) May 2020
  3. Gems: Gpu-enabled memory-aware model-parallelism system for distributed dnn training
    Jain, Arpan, Awan, Ammar Ahmad, Aljuhani, Asmaa M, Hashmi, Jahanzeb Maqbool, Anthony, Quentin G, Subramoni, Hari, Panda, Dhableswar K, Machiraju, Raghu, and Parwani, Anil
    In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis May 2020
  4. Accelerating GPU-based Machine Learning in Python using MPI Library: A Case Study with MVAPICH2-GDR
    Ghazimirsaeed, S Mahdieh*, Anthony, Quentin*, Shafi, Aamir, Subramoni, Hari, and Panda, Dhabaleswar K DK
    In 2020 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC) and Workshop on Artificial Intelligence and Machine Learning for Scientific Applications (AI4S) May 2020

2019

  1. Performance characterization of dnn training using tensorflow and pytorch on modern clusters
    Jain, Arpan, Awan, Ammar Ahmad,  Anthony, Quentin, Subramoni, Hari, and Panda, Dhableswar K DK
    In 2019 IEEE International Conference on Cluster Computing (CLUSTER) May 2019