DiLoCo Bandwidth Simulator

Based on the papers: DiLoCo, Streaming DiLoCo, Scaling Laws for DiLoCo

Model & Network Configuration

N (number of parameters)
Seconds
M (≥1)

Data Parallel (H=1)

DiLoCo

Streaming DiLoCo with Overlap

Streaming pattern: H is the period for each fragment. With P fragments, one fragment is sent every 5.0 steps. Fragment size = total_size / P.

Compute Utilization

For all methods, the reduction of gradients or outer gradients is overlapped over the backward pass computation.

Bandwidth Requirements for Target Utilization

Approximate inter-datacenter bandwidth (Gbps) needed to achieve each compute utilization (CU) threshold:

Method 25% CU 50% CU 75% CU 90% CU 95% CU
Data Parallel - - - - -
DiLoCo - - - - -
Streaming DiLoCo - - - - -