GPU fleets

You can create your own GPU fleet cluster to manage multi-GPU deployments. Ideal for:

Scalable AI inference to cope with dynamic request loads
Affordable resources with scale to zero by default
Deploy a Ray cluster for fine tuning or custom distributed workloads

What's a Kalavai GPU fleet?

A Kalavai fleet, or a GPU pool, is a managed control plane for multi-GPU deployments. Think of it as your own personal AI cloud provider interface, but with none of the devops overhead.

With a GPU fleet you get access to all built-in kalavai-templates, which let you deploy across GPUs:

AI models via model engines (vLLM, llama.cpp)
Ray clusters for training and fine tuning at scale
Serverless containers

Requirements

An account in our platform
Join the Beta Tester Program

Getting started

To create a new Kalavai fleet, head over to the platform and navigate to the Serverless page. Click on Create new deployment and select the Kalavai-pool template.

Create kalavai pool