Cerebrium blog | Launch Week Day 3: Annoucing Multi-Region Deployments

Annoucement

Jul 10, 2025

Launch Week Day 3: Annoucing Multi-Region Deployments

Michael Louis

Founder & CEO

As AI applications become more real-time, personalised, and privacy-sensitive, one thing becomes clear: where your code runs matters just as much as how it runs. Whether you’re building a lightning-fast voice assistant, an LLM-powered agent, or a healthcare tool with sensitive data, latency and data residency are critical — your users expect instant responses, and regulators demand strict boundaries on where data lives.

Today, we’re excited to introduce multi-region deployments in Cerebrium — now in beta.

With multi-region support, you can now deploy your apps across three continents:

🇺🇸 us-east-1 (N. Virginia)
🇬🇧 eu-west-2 (United Kingdom)
🇮🇳 ap-south-1 (India) - coming soon

This allows you to reduce latency, meet regulatory requirements, and increase fault tolerance — all while using the same Cerebrium interface and developer workflow you’re used to.

How to use

Make sure you are running the latest latest version of the Cerebrium PyPi package

pip install --upgrade cerebrium

In your cerebrium.toml just set the region

[cerebrium.hardware]
region = "eu-west-2"
compute = "AMPERE_A10"
cpu = 2
memory = 8.0

or if using our newly released `cerebrium run` cli command, you can do:

cerebrium region set eu-west-2 #set for future commands
cerebrium run main.py::predict --region ap-south-1 #dynamically

Behind the scenes, your deployment spins up in that region, with CPU/GPU-backed compute, access to secrets, and its own isolated storage volume.

Please note: If you previously deployed applications on Cerebrium, with this new version there is a update to our url structure to cater for future functionality.

Things to Know

Each region has its own isolated storage. Files written to /persistent-storage in one region won’t be accessible in another (for now).
Your app won’t automatically replicate across regions — deployments are region-specific (for now).
GPU availability varies by region. For example, ADA_L40S is currently only available in the US.
In terms of pricing, its currently the same globally to make it easier for customers to reconcile but this might change in future.

Performance

Previously, if you were in London and making requests to our servers in us-east-1, you’d experience 150–250ms of network latency. Now, by deploying directly in the UK, that latency drops to just 30–70ms - a 60% decrease! For real-time applications like voice, chat, or interactive agents, that’s the difference between a snappy experience and a noticeable lag.

Why We Built This

The infrastructure behind global AI apps shouldn’t be a blocker. Developers building high-performance AI experiences often face three key challenges:

Latency: Your app might be blazing fast locally, but your users are across the world. Waiting 300ms for a response just isn’t good enough anymore.
Compliance: Regulatory frameworks like GDPR and CCPA require data to stay within specific regions. That’s non-negotiable for many enterprise use cases.
Availability: Downtime in one region shouldn’t mean downtime for your whole app.

Multi-region deployments let you solve all three. Now you can run closer to your users, store data where it legally needs to live, and build systems that stay up even if one region goes down.

What’s Next

This is just the beginning of our global deployment story.

In the coming months, we’re working on:

Automatic regional failover for even higher availability
Edge-aware routing to send traffic to the closest deployment
Cross-region persistent storage sync so your data moves with your code

You can read more about the functionality available to you in our docs here. If you want to see support for other regions or cloud providers, let us know on Discord - Happy Building!

Introducing Cerebrium run: The Fastest Way to Execute Cloud Code

Jul 9, 2025

Annoucement

Introducing Cerebrium run: The Fastest Way to Execute Cloud Code

Jul 9, 2025

Annoucement

Cerebrium Raises $8.5M led by Gradient to Scale the Leading High-Performance Serverless AI Platform

Jul 8, 2025

Annoucement

Cerebrium Raises $8.5M led by Gradient to Scale the Leading High-Performance Serverless AI Platform

Jul 8, 2025

Product

Pricing

Developers

Docs

Status

Company

Blog

About

Use cases

Large language models

Voice

Image & Video

Resources

Examples

Articles

Brand assets

Launch Week Day 3: Annoucing Multi-Region Deployments

How to use

Things to Know

Performance

Why We Built This

What’s Next

MORE ARTICLES LIKE THIS

Introducing Cerebrium run: The Fastest Way to Execute Cloud Code

Introducing Cerebrium run: The Fastest Way to Execute Cloud Code

Cerebrium Raises $8.5M led by Gradient to Scale the Leading High-Performance Serverless AI Platform

Cerebrium Raises $8.5M led by Gradient to Scale the Leading High-Performance Serverless AI Platform

Product

Developers

Company

Use cases

Resources