Online Job Scheduling in Distributed Machine Learning Clusters
Online Job Scheduling in Distributed Machine Learning Clusters
Nowadays large-scale distributed machine learning systems have been deployed to support various analytics and intelligence services in IT firms. To train a large dataset and derive the prediction/inference model, e.g., a deep neural network, multiple workers are run in parallel to train partitions of the input dataset, and update shared …