Annoucement
Jul 9, 2025
Introducing Cerebrium run: The Fastest Way to Execute Cloud Code

Michael Louis
Founder & CEO
One of the biggest challenges developers face today—especially when working with AI and cloud applications—is the time it takes to go from idea to execution. Waiting for infrastructure to spin up, setting up local environments, and managing GPU access can all slow down iteration dramatically.
At Cerebrium, we’ve been building infrastructure that gets out of your way so you can move fast. Previously, our customers could deploy their application and it would be live in 8-10s in the cloud - acceptable for many but we still weren’t satisfied.
Introducing cerebrium run
- a dead-simple way to execute code in the cloud in 1-2 seconds with no provisioning delays and no CI/CD.
Whether you’re testing, debugging, or iterating on new features, cerebrium run
is your shortcut to speed and flexibility. Backed by CPUs and GPUs, with access to your secrets, persistent storage and real-time logs it makes development dramatically better!
With cerebrium run, you can:
Quickly iterate on application logic without worrying about CI/CD or the spinning wheel of death
Run unit tests in your actual production environment, with real secrets and volumes
Trigger one-off tasks like compiling TensorRT engines, preprocessing data, or running migrations
Leverage GPU acceleration for any compute-heavy job, without provisioning or scaling anything
Here’s how it works:
Say you have a simple function like this:
With cerebrium run
, you can execute it remotely in the cloud with the following command:
cerebrium run main.py::squared --value 2
Within ~2 seconds, your code runs in the cloud, and you’ll see real-time logs streamed back to your CLI.
If you want to change hardware resources, simply add a cerebrium.toml file to the directory. You also have access to all the typical Cerebrium features such as accessing secrets stored through your dashboard or writing to your persistent storage volume. Lets look at an example for copying a file from an S3 bucket to a Cerebrium storage volume.
Once satisfied, you can run cerebrium deploy
and it will become a scalable REST endpoint that can scale to 1000s of containers in seconds. The difference between cerebrium run
and cerebrium deploy
is that the former is ephemeral.
Why We Built This
We built cerebrium run
because testing and executing cloud code shouldn’t require a full deployment pipeline or manual infrastructure setup. Developers deserve a tool that’s as fast and flexible as their local machine — but with the power of the cloud.
With cerebrium run, you can:
Quickly iterate on application logic without worrying about CI/CD or the wheel of death
Run unit tests in your actual production environment, with real secrets and volumes
Trigger one-off tasks like compiling TensorRT engines, preprocessing data, or running migrations
Leverage GPU acceleration for any compute-heavy job, without provisioning or scaling anything
All of this happens inside an isolated, serverless environment that has access to your secrets, storage volumes, and environment variables — exactly like your production setup.
In short: Our customers needed something that makes cloud development feel local, and we built it.
Still Early — But Powerful
This is the first version of cerebrium run
, and there’s a lot more coming:
Even faster performance
Support for dockerfiles
Support for FastAPI’s
Support for background jobs and async functions
To try it out, please install the latest cli version (pip install —upgrade cerebrium
) and test it on any Python file.
If you have feedback or ideas, we’d love to hear from you — we’re building this for you.