Scaling GRPC Tensorflow on 512 nodes of Cori Supercomputer

Type: Preprint

Publication Date: 2017-12-26

Citations: 10

Locations

  • arXiv (Cornell University) - View

Similar Works

Action Title Year Authors
+ Scaling GRPC Tensorflow on 512 nodes of Cori Supercomputer 2017 Amrita Mathuriya
Thorsten Kurth
Vivek Rane
Mustafa Mustafa
Lei Shao
Debbie Bard
Prabhat
Victor W. Lee
+ PDF Chat Throughput Prediction of Asynchronous SGD in TensorFlow 2020 Zhuojin Li
Wumo Yan
Marco Paolieri
Leana Golubchik
+ PDF Chat Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation 2019 Ammar Ahmad Awan
Jereon Bedorf
Ching-Hsiang Chu
Hari Subramoni
Dhabaleswar K. Panda
+ PDF Chat FireCaffe: Near-Linear Acceleration of Deep Neural Network Training on Compute Clusters 2016 Forrest Iandola
Matthew W. Moskewicz
Khalid Ashraf
Kurt Keutzer
+ User-transparent Distributed TensorFlow 2017 Abhinav Vishnu
Joseph Manzano
Charles Siegel
Jeff Daily
+ FireCaffe: near-linear acceleration of deep neural network training on compute clusters 2015 Forrest Iandola
Khalid Ashraf
Matthew W. Moskewicz
Kurt Keutzer
+ FireCaffe: near-linear acceleration of deep neural network training on compute clusters 2015 Forrest Iandola
Khalid Ashraf
Mattthew W. Moskewicz
Kurt Keutzer
+ PyTorch Distributed: Experiences on Accelerating Data Parallel Training 2020 Li Shen
Yanli Zhao
Rohan Varma
Omkar Salpekar
Pieter Noordhuis
Teng Li
Adam Paszke
Jeff Smith
Brian Vaughan
Pritam Damania
+ On Scale-out Deep Learning Training for Cloud and HPC 2018 Srinivas Sridharan
Karthikeyan Vaidyanathan
Dhiraj Kalamkar
Dipankar Das
Mikhail E. Smorkalov
Mikhail Shiryaev
Dheevatsa Mudigere
Naveen Mellempudi
Sasikanth Avancha
Bharat Kaul
+ MXNET-MPI: Embedding MPI parallelism in Parameter Server Task Model for scaling Deep Learning. 2018 Amith R. Mamidala
Γεώργιος Κόλλιας
Chris Ward
Fausto Artico
+ On Scale-out Deep Learning Training for Cloud and HPC. 2018 Srinivas Sridharan
Karthikeyan Vaidyanathan
Dhiraj Kalamkar
Dipankar Das
Mikhail E. Smorkalov
Mikhail Shiryaev
Dheevatsa Mudigere
Naveen Mellempudi
Sasikanth Avancha
Bharat Kaul
+ MXNET-MPI: Embedding MPI parallelism in Parameter Server Task Model for scaling Deep Learning 2018 Amith R Mamidala
Γεώργιος Κόλλιας
Chris Ward
Fausto Artico
+ HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training using TensorFlow 2019 Ammar Ahmad Awan
Arpan Jain
Quentin Anthony
Hari Subramoni
Dhabaleswar K. Panda
+ PDF Chat swCaffe: A Parallel Framework for Accelerating Deep Learning Applications on Sunway TaihuLight 2018 Liandeng Li
Jiarui Fang
Haohuan Fu
Jinlei Jiang
Wenlai Zhao
Conghui He
Xin You
Guangwen Yang
+ Image Classification at Supercomputer Scale 2018 Chris Ying
Sameer Kumar
Dehao Chen
Tao Wang
Youlong Cheng
+ PDF Chat TensorFlow Doing HPC 2019 Steven W. D. Chien
Stefano Markidis
Vyacheslav Olshevsky
Yaroslav Bulatov
Erwin Laure
Jeffrey S. Vetter
+ swCaffe: a Parallel Framework for Accelerating Deep Learning Applications on Sunway TaihuLight 2019 Jiarui Fang
Liandeng Li
Haohuan Fu
Jinlei Jiang
Wenlai Zhao
Conghui He
Xin You
Guangwen Yang
+ PDF Chat PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel 2023 Yanli Zhao
Andrew Gu
Rohan Varma
Liang Luo
Chien-Chin Huang
Min Xu
Less Wright
Hamid Shojanazeri
Myle Ott
Sam Shleifer
+ PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel 2023 Yanli Zhao
Andrew Gu
Rohan Varma
Liang Luo
Chien-Chin Huang
Min Xu
Less Wright
Hamid Shojanazeri
Myle Ott
Sam Shleifer
+ PDF Chat Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs 2018 Shaohuai Shi
Qiang Wang
Xiaowen Chu