Systems | Development | Analytics | API | Testing

Inference Is the New Bottleneck: How to Plan GPU Capacity for Production AI

Most enterprises sized their AI infrastructure with a playbook written for training. However, training is no longer the typical workload. Inference now eats up roughly two-thirds of all AI compute, and it is changing shape fast enough that the rules of thumb from 18 months ago just do not hold. Our view at ClearML is pretty simple: when the workload shifts this much, the platform underneath it has to shift with it.

Pre-Packaged Inference, Production-Grade: AMD AIMs with ClearML

Running production LLM inference on a new accelerator family is a layered problem. The model matters. The runtime that exists for the GPU you have matters at least as much. So does the precision mode that works without losing accuracy, the inference engine that hits your throughput targets, and the secure endpoint the rest of your stack can actually call. The entire stack underneath the model is where most of the real engineering work lives and where the cost of getting it wrong shows up first.

Inside NERSC at Berkeley Lab: How a DOE Office of Science User Facility Is Exploring ClearML for Scientific AI Workflows

NERSC, the mission high-performance computing center for the U.S. Department of Energy Office of Science, is using ClearML as part of the AI infrastructure stack for Perlmutter, the upcoming Doudna supercomputer, and the broader American Science Cloud. Here is a look at what they are exploring and why it matters for AI for science at scale.

ClearML and Dell Technologies: A Faster Path to Enterprise AI

Enterprises are buying AI infrastructure faster than their platform teams can operationalize it. Dell and ClearML are working together to close that gap, giving enterprises a faster, simpler path from Dell AI Factory hardware to a production-grade AI platform. Dell carries the hardware. ClearML provides the AI infrastructure layer on top. Together, the two give platform teams a way to deliver AI as a service to their organization without a multi-year integration project.