Ask a Question

Prefer a chat interface with context about you and your work?

Impact of RoCE Congestion Control Policies on Distributed Training of DNNs

Impact of RoCE Congestion Control Policies on Distributed Training of DNNs

Ahstract-RDMA over Converged Ethernet (RoCE) has gained significant attraction for datacenter networks due to its compatibility with conventional Ethernet-based fabric. However, the RDMA protocol is efficient only on (nearly) lossless networks, emphasizing the vital role of congestion control on RoCE networks. Unfortunately, the native RoCE congestion control scheme, based on …