Public Petals Swarm: BitTorrent-style LLMs

Contribute to the public Petals swarm and help deploy and fine tune Large Language Models across consumer-grade devices. See more about the Petals project here. You'll get:

Eternal kudos from the community!
Access to all the models in the server
Easy access for inference (via Petals SDK and installation-free Kalavai endpoint).

Requirements

A free Kalavai account. Create one here.
A computer with the minimum requirements (see below)

Hardware requirements

1+ NVIDIA GPU
2+ CPUs
4GB+ RAM
Free space 4x available VRAM (for an 8GB VRAM GPU, you'll need ~32GB free space in your disk)

How to join

Create a free account with Kalavai.
Install the kalavai client following the instructions here. Currently we support Linux distros and Windows.
Get the joining token. Visit our platform and go to Community pools. Then click Join on the Petals Pool to reveal the joining details. Copy the command (including the token).

Join Petals

Authenticate the computer you want to use as worker:

$ kalavai login

[10:33:16] Kalavai account details. If you don't have an account, create one at https://platform.kalavai.net                                                                 
User email: <your email>
Password: <your password>

[10:33:25] <email> logged in successfully

Join the pool with the following command:

$ kalavai pool join <token>

[16:28:14] Token format is correct
           Joining private network

[16:28:24] Scanning for valid IPs...
           Using 100.10.0.8 address for worker
           Connecting to PETALS @ 100.10.0.9 (this may take a few minutes)...
[16:29:41] Worskpace created
           You are connected to PETALS

Check Petals health

Kalavai's pool connects directly to the public swarm on Petals, which means we can use their public health check UI to see how much we are contributing and what models are ready to use.

swarm health

Models with at least one copy of each shard (a green dot in each column) are ready to be used. If not, wait for more workers to join in.

Using the kalavai client you can monitor the state of the pool and all of the connected nodes:

$ kalavai pool status

# Displays the status of the pool

$ kalavai node list

# Displays the list of connected nodes, and their current status

The command kalavai node list is useful to see if our node has any issues and whether it's currently online.

How to use the models

For all public swarms you can use the Petals SDK in the usual way. Here is an example:

from transformers import AutoTokenizer
from petals import AutoDistributedModelForCausalLM

# Choose any model available at https://health.petals.dev
model_name = "mistralai/Mixtral-8x22B-Instruct-v0.1"

# Connect to a distributed network hosting model layers
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoDistributedModelForCausalLM.from_pretrained(model_name)

# Run the model as if it were on your computer
inputs = tokenizer("A cat sat", return_tensors="pt")["input_ids"]
outputs = model.generate(inputs, max_new_tokens=5)
print(tokenizer.decode(outputs[0]))  # A cat sat on a mat...

This path is great if you are a dev with python installed, and don't mind installing the Petals SDK. If you want an install-free path, Kalavai deploys a single endpoint for models, which allows you to do inference via gRPC and HTTP requests. Substitute KALAVAI_ENDPOINT with the endpoint displayed under the Community Pools page. Here is a request example:

"""
More info: https://github.com/petals-infra/chat.petals.dev

Required: pip install websockets
"""
import time
import json
import websockets
import asyncio


KALAVAI_ENDPOINT = "192.168.68.67:31220" # <-- change for the kalavai endpoint
MODEL_NAME = "mistralai/Mixtral-8x22B-Instruct-v0.1" # <-- change for the models available in Kalavai PETALS pool.


async def ws_generate(text, max_length=100, temperature=0.1):
    async with websockets.connect(f"ws://{KALAVAI_ENDPOINT}/api/v2/generate") as websocket:
        try:
            await websocket.send(
                json.dumps({"model": MODEL_NAME, "type": "open_inference_session", "max_length": max_length})
            )
            response = await websocket.recv()
            result = json.loads(response)

            if result["ok"]:
                await websocket.send(
                    json.dumps({
                        "type": "generate",
                        "model": MODEL_NAME,
                        "inputs": text,
                        "max_length": max_length,
                        "temperature": temperature
                    })
                )
                response = await websocket.recv()
                return json.loads(response)
            else:
                return response
        except Exception as e:
            return {"error": str(e)}


if __name__ == "__main__":
    t = time.time()
    output = asyncio.get_event_loop().run_until_complete(
        ws_generate(text="Tell me a story: ")
    )
    final_time = time.time() - t
    print(f"[{final_time:.2f} secs]", output)
    print(f"{output['token_count'] / final_time:.2f}", "tokens/s")

NOTE: the endpoints are only available within worker nodes, not from any other computer.

You can either pause sharing, or stop and leave the pool altogether (don't worry, you can rejoin using the same steps above anytime).

To pause sharing (but remain on the pool), run the following command:

kalavai pool pause

When you are ready to resume sharing, run:

kalavai pool resume

To stop and leave the pool, run the following:

kalavai pool stop

FAQs

Something isn't right

Growing pains! Please report any issues in our github repository.

Can I join (and leave) whenever I want?

Yes, you can, and we won't hold a grudge if you need to use your computer. You can pause or quit altogether as indicated here.

What is in it for me?

If you decide to share your compute with the community, not only you'll get access to all the models we deploy in it, but you will also gather credits in Kalavai, which will be redeemable for computing in any other public pool (this feature is coming really soon).

Is my data secured / private?

The public pool in Kalavai has the same level of privacy and security than the general Petals public swarm. See their privacy details here. In the future we will improve support for private swarms; at the moment private swarms are a beta feature for all kalavai pools that can be used via the petals template.

Is my GPU constantly being used?

Yes and no. The model weights for the shard you are responsible for are loaded in GPU memory for as long as your machine is sharing. However, this does not mean the GPU is active (doing computing) constantly; computation (and hence the vast majority of energy comsumption) only happens when your shard is summoned to process inference requests.

If at any point you need your GPU memory back, pause or stop sharing and come back when you are free.