Enterprise AI Infrastructure Security Series - 5) Compute & Data Access Governance
Securing ClearML for the Enterprise — Part 5: Compute & Data Access Governance
In this video we walk through ClearML's compute governance layer — resource pools, resource profiles, and resource policies — and how they work together to give every team fair, governed access to your GPU infrastructure while keeping it fully utilized.
What we cover:
- The three constructs — resource pools (hardware), resource profiles (job sizes), and resource policies (who gets what)
- How everything in this series operates within a tenant — full separation between tenants, tenant admins govern their own workspace
- The flexibility of these controls — adapting to changing business requirements without re-architecting anything
- Quotas (limits) and reservations — ceilings, priority, and how reservations are not static carve-outs
- Policy priority within a profile — ranking which team's jobs get submitted to hardware first when multiple teams share the same profile
- Pool priority — routing jobs to on-prem hardware first and bursting to cloud only when local capacity is full
- Fractional GPU profiles — dynamic, per-task GPU slicing across MIG, non-MIG, and AMD Instinct hardware without pre-configuration
- Graceful preemption — abort callbacks that let over-quota jobs save checkpoints and restore state when automatically rescheduled
- Static dedication — when you do need to assign specific hardware exclusively to a team
- Demo — creating a half-GPU profile, connecting it to an on-prem RTX 6000 pool with cloud burst overflow, assigning it to an AI Dev team with quotas and reservations
- How compute governance connects back to access rules, vaults, and service accounts — five layers reinforcing each other
- GenAI workload considerations — fine-tuning quotas, inference reservations, and bounded experimentation
Previous videos in this series:
- Part 1 — Introduction to the Six Layers of Enterprise Security: https://www.youtube.com/watch
- Part 2 — Identity Provider Setup, Group Sync & Access Rules: https://www.youtube.com/watch
- Part 3 — Configuration Governance with Administrator Vaults: https://youtu.be/vse_015TaWM
- Part 4 — Service Accounts & Automation Security: https://youtu.be/aPyVLSOp_4I
This is Part 5 of our series on enterprise AI infrastructure security. Whether you're an IT director managing shared GPU infrastructure, a platform engineer designing resource policies, or a team lead trying to understand how your teams get fair access to compute — this walkthrough covers the practical, hands-on configuration from start to finish.
🔗Links & Resources
- ClearML Enterprise: https://clear.ml/enterprise ClearML Docs — Resource Policies: https://clear.ml/docs/latest/docs/webapp/resource_policies/ - - -
- ClearML Docs — Resource Configuration: https://clear.ml/docs/latest/docs/webapp/settings/webapp_settings_resource_configs/
- ClearML Docs — Fractional GPUs: https://clear.ml/docs/latest/docs/clearml_agent/clearml_agent_fractional_gpus/
- ClearML Docs — Dynamic GPU Allocation: https://clear.ml/docs/latest/docs/clearml_agent/clearml_agent_dynamic_gpus/
- ClearML Blog — Dynamic Fractional GPUs: https://clear.ml/blog/maximizing-gpu-utilization-with-clearmls-dynamic-fractional-gpus-unleashing-the-full-power-of-your-ai-infrastructure