Inference Is the New Bottleneck: How to Plan GPU Capacity for Production AI
Most enterprises sized their AI infrastructure with a playbook written for training. However, training is no longer the typical workload. Inference now eats up roughly two-thirds of all AI compute, and it is changing shape fast enough that the rules of thumb from 18 months ago just do not hold. Our view at ClearML is pretty simple: when the workload shifts this much, the platform underneath it has to shift with it.