Scaling AI Tutors: Creatium’s Journey to Realtime Global AI Avatars with Cerebrium

"

By offloading infrastructure management to Cerebrium, our team is now able to focus entirely on our core product… Users now start interacting with our AI experiences near-instantly, which was simply not achievable with our previous infrastructure.

Dr. Deepak Sekar

CEO and Co-Founder

Use case

Real-time AI avatar rendering, GPU burst training for AI models, low-latency serverless inference for interactive learning

Location

Cincinnati, Ohio

Customer since

October 2025

Features used

Dedicated AI onboarding engineer & realtime Slack support, Limitless autoscaling globally, Fast cold starts

Highlights

~10-second GPU cold boot times (down from minutes), 99.97% production uptime, Global deployments, Consolidation of multi-cloud infrastructure, & Faster time-to-production

Want to learn more?

Find your perfect fit with a tailored demo

Book a demo

Introduction

Creatium is redefining what online learning looks like. Instead of relying on traditional video lectures followed by quizzes, the platform transforms educational content into immersive, AI-powered learning experiences designed around how people actually learn. Organizations ranging from Fortune 100 companies to K–12 school districts use Creatium to build AI coaches and interactive lessons that rival the experience of a human tutor.

At the center of the platform is a combination of pedagogical science, real-time AI systems, and GPU-accelerated rendering. This architecture enables highly interactive educational experiences while also dramatically improving productivity for instructional design teams, allowing them to build sophisticated learning content significantly faster than traditional approaches.

The Challenge

As Creatium’s product evolved into a real-time AI platform, infrastructure limitations quickly became a major bottleneck. The team struggled with slow GPU server provisioning, unpredictable availability during peak load, and reliability issues that impacted the user experience. For a product built around interactive AI tutors and avatars, waiting minutes for infrastructure to spin up created friction that undermined the seamless experience Creatium wanted to deliver.

Scaling the platform also placed a heavy operational burden on the engineering team. The company relied on multiple cloud providers and maintained custom scaling logic to deal with recurring GPU capacity issues. Managing multi-cloud infrastructure, building internal tooling, and handling deployment complexity began to pull engineers away from the core mission of improving the learning product.

Creatium also runs a wide range of AI workloads beyond real-time avatars, including LLMs, extraction pipelines, guardrails, text-to-speech, and beyond. Each of these workloads has different infrastructure requirements, traffic profiles, and scaling behavior. Managing the right hardware, tuning autoscaling policies, and keeping everything performant as the product evolved created growing operational complexity. As the number of services increased, so did the maintenance and management of the underlying infrastructure alongside it. Thereby slowing the development team’s ability to deploy and iterate quickly at a compounding rate.

Evaluation & Implementation

Creatium evaluated several infrastructure providers including Amazon Web Services, Microsoft Azure, and RunPod before ultimately selecting Cerebrium. The platform initially stood out as a strong fit for Creatium’s inference workloads, but the team soon realized it could support their entire AI pipeline including both training and inference, better than other solutions.

Creatium’s runs a combination of workloads: intensive training jobs for avatar, voice, and LLM models on high-end GPUs like H100s and H200s, alongside real-time inference on more cost-efficient GPUs such as L40s and L4s. Cerebrium’s serverless infrastructure made it possible to dynamically provision both tiers on demand while maintaining fast cold-start performance which is critical for real-time AI interactions. Implementation was smooth, supported by direct collaboration with the Cerebrium engineering team via Slack, which helped the team quickly migrate and expand their workloads.

The Results

After migrating, Creatium saw immediate improvements in performance and reliability. GPU cold-start times dropped from several minutes to roughly 10 seconds, allowing users to begin interacting with AI-powered learning experiences almost instantly. This eliminated the need for workarounds the team had previously built, such as artificial gamification elements designed to hide infrastructure boot times.

By consolidating workloads that had previously been spread across multiple infrastructure providers, Cerebrium reduced operational complexity and automated much of the performance and scaling optimization. Engineers no longer need to continuously tune workloads or maintain complex multi-cloud logic, and new team members can ramp up quickly without deep infrastructure expertise. This freed the development team to focus almost entirely on improving the core product, while the platform continues to deliver 99.999% uptime and cost efficiency—providing the reliability and financial predictability needed to scale.

How DistilLabs is Delivering 50% Lower Inference Costs with Production-Grade Autoscaling on Cerebrium ›

Trying out AI at your company?

We offer up to $1,000.00 in free credits and face-time with our engineers to get you started.

Product

Pricing

Developers

Docs

Status

Company

Blog

About

Use cases

Large language models

Voice

Image & Video

Resources

Examples

Articles

Brand assets