Annoucement
Jul 10, 2025
Launch Week Day 3: Annoucing Multi-Region Deployments

Michael Louis
Founder & CEO
As AI applications become more real-time, personalised, and privacy-sensitive, one thing becomes clear: where your code runs matters just as much as how it runs. Whether you’re building a lightning-fast voice assistant, an LLM-powered agent, or a healthcare tool with sensitive data, latency and data residency are critical — your users expect instant responses, and regulators demand strict boundaries on where data lives.
Today, we’re excited to introduce multi-region deployments in Cerebrium — now in beta.
With multi-region support, you can now deploy your apps across three continents:
🇺🇸 us-east-1 (N. Virginia)
🇬🇧 eu-west-2 (United Kingdom)
🇮🇳 ap-south-1 (India) - coming soon
This allows you to reduce latency, meet regulatory requirements, and increase fault tolerance — all while using the same Cerebrium interface and developer workflow you’re used to.
How to use
Make sure you are running the latest latest version of the Cerebrium PyPi package
pip install --upgrade cerebrium
In your cerebrium.toml just set the region
or if using our newly released `cerebrium run` cli command, you can do:
Behind the scenes, your deployment spins up in that region, with CPU/GPU-backed compute, access to secrets, and its own isolated storage volume.
Please note: If you previously deployed applications on Cerebrium, with this new version there is a update to our url structure to cater for future functionality.
Things to Know
Each region has its own isolated storage. Files written to
/persistent-storage
in one region won’t be accessible in another (for now).Your app won’t automatically replicate across regions — deployments are region-specific (for now).
GPU availability varies by region. For example, ADA_L40S is currently only available in the US.
In terms of pricing, its currently the same globally to make it easier for customers to reconcile but this might change in future.
Performance
Previously, if you were in London and making requests to our servers in us-east-1, you’d experience 150–250ms of network latency. Now, by deploying directly in the UK, that latency drops to just 30–70ms - a 60% decrease! For real-time applications like voice, chat, or interactive agents, that’s the difference between a snappy experience and a noticeable lag.
Why We Built This
The infrastructure behind global AI apps shouldn’t be a blocker. Developers building high-performance AI experiences often face three key challenges:
Latency: Your app might be blazing fast locally, but your users are across the world. Waiting 300ms for a response just isn’t good enough anymore.
Compliance: Regulatory frameworks like GDPR and CCPA require data to stay within specific regions. That’s non-negotiable for many enterprise use cases.
Availability: Downtime in one region shouldn’t mean downtime for your whole app.
Multi-region deployments let you solve all three. Now you can run closer to your users, store data where it legally needs to live, and build systems that stay up even if one region goes down.
What’s Next
This is just the beginning of our global deployment story.
In the coming months, we’re working on:
Automatic regional failover for even higher availability
Edge-aware routing to send traffic to the closest deployment
Cross-region persistent storage sync so your data moves with your code
You can read more about the functionality available to you in our docs here. If you want to see support for other regions or cloud providers, let us know on Discord - Happy Building!