Quentin Anthony | Publications

* denotes equal contribution

An up-to-date list is available on Google Scholar

2022

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

Black, Sidney, Biderman, Stella, Hallahan, Eric, Anthony, Quentin, Gao, Leo, Golding, Laurence, He, Horace, Leahy, Connor, McDonell, Kyle, Phang, Jason, Pieler, Michael, Prashanth, Usvsn Sai, Purohit, Shivanshu, Reynolds, Laria, Tow, Jonathan, Wang, Ben, and Weinbach, Samuel

In Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models May 2022

Abs

We introduce GPT-NeoX-20B, a 20 billion parameter autoregressive language model trained on the Pile, whose weights will be made freely and openly available to the public through a permissive license. It is, to the best of our knowledge, the largest dense autoregressive model that has publicly available weights at the time of submission. In this work, we describe GPT-NeoX-20B’s architecture and training, and evaluate its performance. We open-source the training and evaluation code, as well as the model weights, at https://github.com/EleutherAI/gpt-neox.
Accelerating Broadcast Communication with GPU Compression for Deep Learning Workloads

Zhou, Qinghua, Anthony, Quentin, Shafi, Aamir, Subramoni, Hari, and Panda, Dhabaleswar K. DK

In 2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC) May 2022
Highly Efficient Alltoall and Alltoallv Communication Algorithms for GPU Systems

Chen, Chen-Chun, Khorassani, Kawthar Shafie, Anthony, Quentin G., Shafi, Aamir, Subramoni, Hari, and Panda, Dhabaleswar K.

In 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) May 2022

2021

Scaling Single-Image Super-Resolution Training on Modern HPC Clusters: Early Experiences

Anthony, Quentin, Xu, Lang, Subramoni, Hari, and Panda, Dhabaleswar K DK

In 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) May 2021
Adaptive and Hierarchical Large Message All-to-all Communication Algorithms for Large-scale Dense GPU Systems

Khorassani, Kawthar Shafie, Chu, Ching-Hsiang, Anthony, Quentin G, Subramoni, Hari, and Panda, Dhabaleswar K

In 2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid) May 2021
Cross-layer Visualization and Profiling of Network and I/O Communication for HPC Clusters

Kousha, Pouya, Anthony, Quentin, Subramoni, Hari, and Panda, Dhabaleswar K

arXiv preprint arXiv:2109.08329 May 2021
Evaluating Multi-Level Checkpointing for Distributed Deep Neural Network Training

Anthony, Quentin, and Dai, Donglai

In SC Workshops Supplementary Proceedings (SCWS) May 2021

2020

HyPar-Flow: exploiting MPI and Keras for scalable hybrid-parallel DNN training with tensorflow

Awan, Ammar Ahmad, Jain, Arpan, Anthony, Quentin, Subramoni, Hari, and Panda, Dhabaleswar K

In International Conference on High Performance Computing May 2020
Efficient training of semantic image segmentation on summit using horovod and mvapich2-gdr

Anthony, Quentin, Awan, Ammar Ahmad, Jain, Arpan, Subramoni, Hari, and Panda, Dhabaleswar K DK

In 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) May 2020
Gems: Gpu-enabled memory-aware model-parallelism system for distributed dnn training

Jain, Arpan, Awan, Ammar Ahmad, Aljuhani, Asmaa M, Hashmi, Jahanzeb Maqbool, Anthony, Quentin G, Subramoni, Hari, Panda, Dhableswar K, Machiraju, Raghu, and Parwani, Anil

In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis May 2020
Accelerating GPU-based Machine Learning in Python using MPI Library: A Case Study with MVAPICH2-GDR

Ghazimirsaeed, S Mahdieh*, Anthony, Quentin*, Shafi, Aamir, Subramoni, Hari, and Panda, Dhabaleswar K DK

In 2020 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC) and Workshop on Artificial Intelligence and Machine Learning for Scientific Applications (AI4S) May 2020

2019

Performance characterization of dnn training using tensorflow and pytorch on modern clusters

Jain, Arpan, Awan, Ammar Ahmad, Anthony, Quentin, Subramoni, Hari, and Panda, Dhableswar K DK

In 2019 IEEE International Conference on Cluster Computing (CLUSTER) May 2019