Tutorial

Jul 31, 2025

Integrating PayPal’s Model Context Protocol (MCP) into a Real-time Voice Agent

Michael Louis

Founder & CEO

In this tutorial, we’ll build a real-time voice assistant that interacts with PayPal’s Model Context Protocol (MCP). The agent will listen to spoken input, route it through an LLM with access to PayPal tools (like creating invoices or managing subscriptions), and respond in natural-sounding speech — all in real time.

Using MCP over voice unlocks powerful, action-oriented assistants that don’t just reply, but actually perform real tasks. Whether it’s sending an invoice, pausing a subscription, or processing a refund, users can complete actions instantly through natural conversation—no clicking through dashboards or waiting on support.

We’ll use Pipecat to orchestrate the voice pipeline and Cerebrium to run it serverlessly

You can view the final version of the code here

🔧 Prerequisites

Before getting started, make sure you have the following:

Create Cerebrium Project

  1. Run cerebrium init paypal-mcp-agent

  2. You can add the above keys to a .env file in the folder like:

    DAILY_TOKEN=
    DEEPGRAM_API_KEY=
    OPENAI_API_KEY=
    CARTESIA_API_KEY
    
    
  3. Add dependancies to your cerebrium.toml

    cerebrium.deployment]
    name = "paypal-mcp-agent"
    python_version = "3.11"
    include = ["./*", "main.py", "cerebrium.toml"]
    exclude = ["./example_exclude"]
    
    [cerebrium.hardware]
    region = "us-east-1"
    provider = "aws"
    compute = "CPU"
    cpu = 6
    memory = 14.0
    
    [cerebrium.scaling]
    min_replicas = 0
    max_replicas = 2
    cooldown = 180
    replica_concurrency=6
    
    [cerebrium.dependencies.pip]
    torch = ">=2.0.0"
    "pipecat-ai[silero, daily, openai, deepgram, cartesia,mcp]" = "==0.0.76"
    aiohttp = ">=3.9.4"
    torchaudio = ">=2.3.0"
    channels = ">=4.0.0"
    requests = "==2.32.2"
    "python-dotenv" = "latest"
    
    [cerebrium.dependencies.apt]
    nodejs = "latest"
    npm = "latest"

This sets up our environment to run our server and our Paypal MCP server.

Create Agent Logic

In your main.py, create the logic for your voice agent to interact with the Paypal MCP:

import asyncio
import os
import subprocess
import sys
import time
from multiprocessing import Process
import shutil

import aiohttp
import requests
from loguru import logger
from pipecat.frames.frames import LLMMessagesFrame, EndFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner

from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
    LLMAssistantResponseAggregator,
    LLMUserResponseAggregator,
)
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.cartesia.tts import CartesiaTTSService
from deepgram import LiveOptions
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from mcp import StdioServerParameters
from pipecat.services.mcp_service import MCPClient
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext

from dotenv import load_dotenv

load_dotenv()

logger.remove(0)
logger.add(sys.stderr, level="DEBUG")

deepgram_voice: str = "aura-asteria-en"
async def main(room_url: str, token: str, paypal_access_token: str, paypal_environment: str):
    async with aiohttp.ClientSession() as session:
        transport = DailyTransport(
            room_url,
            token,
            "Respond bot",
            DailyParams(
                audio_out_enabled=True,
                audio_in_enabled=False,
                transcription_enabled=False,
                vad_enabled=True,
                vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.15)),
                vad_audio_passthrough=True,
            ),
        )

        stt = DeepgramSTTService(
            api_key=os.environ.get("DEEPGRAM_API_KEY"),
            live_options=LiveOptions(
                model="nova-3-general",
                language="en-US",
                smart_format=True,
                vad_events=True
            )
        )

        tts = CartesiaTTSService(
            api_key=os.environ.get("CARTESIA_API_KEY"),
            voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22",  # British Lady
        )

        print(f"Paypal access token: {paypal_access_token}")
        mcp = MCPClient(
            server_params=StdioServerParameters(
                command="npx",
                args=[
                    "-y",
                    "@paypal/mcp",
                    "--tools=all"
                ],
                env={"PAYPAL_ACCESS_TOKEN": paypal_access_token, "PAYPAL_ENVIRONMENT": paypal_environment},
            )
        )
        llm = OpenAILLMService(
            name="LLM",
            model="gpt-4.1",
        )

    # Create tools schema from the MCP server and register them with llm
        tools = await mcp.register_tools(llm)

        context = OpenAILLMContext(
            messages = [
                {
                    "role": "system",
                    "content": "You are a helpful assistant with access to PayPal tools. You have access to MCP tools. Before doing a tool call, please say 'Sure, give me a moment'",
                },
            ],
            tools=tools,
        )

        context_aggregator = llm.create_context_aggregator(context)

        pipeline = Pipeline(
            [
                transport.input(),  # Transport user input
                stt,  # Speech-to-text
                context_aggregator.user(),
                llm,  # LLM
                tts,  # TTS
                transport.output(),  # Transport bot output
                context_aggregator.assistant(),
            ]
        )

        task = PipelineTask(
            pipeline,
            params=PipelineParams(
                allow_interruptions=True,
                enable_metrics=True
            ),
        )

        # When the first participant joins, the bot should introduce itself.
        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
            # Kick off the conversation.
            time.sleep(1.5)
            context.messages.append(
                {
                    "role": "system",
                    "content": "Introduce yourself by saying 'hello, I'm FastBot, how can I help you today?'",
                }
            )
            await task.queue_frame(LLMMessagesFrame(context.messages))

        # When the participant leaves, we exit the bot.
        @transport.event_handler("on_participant_left")
        async def on_participant_left(transport, participant, reason):
            await task.queue_frame(EndFrame())

        # If the call is ended make sure we quit as well.
        @transport.event_handler("on_call_state_updated")
        async def on_call_state_updated(transport, state):
            if state == "left":
                await task.queue_frame(EndFrame())

        runner = PipelineRunner()

        await runner.run(task)
        await session.close()

In the above we do the following:

  • We are creating a Daily meeting room where interact with our agent. You can switch this transport layer to be via a phone call using Twilio - just checkout the Pipecat documentation

  • We setup our agent model pipeline

    • Deepgram for STT

    • OpenAI for a LLM - you need a LLM that supports function calling

    • Cartesia for TTS

  • Setup your Paypal MCP server that will use your Paypal access token in order to authenticate against your account. We will show you how to generate this later in the tutorial. You will see that we are running it server side with npx (node.js). Pipecat comes with MCP support and you can read more about Paypals MCP support here.

  • Lastly, we bring it all together and set the context of the agent so it is aware of its responsibilities.

Create Paypal Access token

In order to generate a PayPal access token, you need to get a ClientID and Client Secret from the PayPal developer dashboard. You can follow the instructions to do that here.

Generate your PayPal Access token using the following cURL:

curl --location 'https://api-m.paypal.com/v1/oauth2/token' \
--header 'Accept: application/json' \
--header 'Accept-Language: en_US' \
--header 'Content-Type: application/x-www-form-urlencoded' \
--header 'Authorization: Basic CLIENT_ID:CLIENT_SECRET' \
--data-urlencode 'grant_type=client_credentials'

This is creating a PayPal Access token for your Production account so make sure you are using the right client ID/Client Secret. There is also a helper file in the Github repo here.

Create Daily Room

As mentioned, we are using Daily’s meeting rooms as the transport layer and so we need to create the logic that creates these rooms on demand and then gets our agent to join it. You can add the following to your main.py:

async def start_bot(paypal_access_token: str, paypal_environment: str):
    try:
        room_info = create_room()
        if "status_code" in room_info and room_info["status_code"] != 200:
            logger.error(f"Failed to create room: {room_info}")
            return {"message": "Failed to create room", "status_code": 500}

        room_url = room_info["url"]
        room_token = room_info["token"]
        
        # Start main() in background task so we can return room info immediately
        asyncio.create_task(main(room_url, room_token, paypal_access_token, paypal_environment))
        
        return {
            "message": "Room created successfully",
            "status_code": 200,
            "room_url": room_url
        }
    except Exception as e:
        logger.error(f"Exception in main: {e}")
        sys.exit(1)  # Exit with a non-zero status code

    return {"message": "session finished"}


def create_room():
    url = "https://api.daily.co/v1/rooms/"
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {os.environ.get('DAILY_TOKEN')}",
    }
    data = {
        "properties": {
            "exp": int(time.time()) + 60 * 5,  ##5 mins
            "eject_at_room_exp": True,
        }
    }

    response = requests.post(url, headers=headers, json=data)
    if response.status_code == 200:
        room_info = response.json()
        token = create_token(room_info["name"])
        if token and "token" in token:
            room_info["token"] = token["token"]
        else:
            logger.error("Failed to create token")
            return {
                "message": "There was an error creating your room",
                "status_code": 500,
            }
        return room_info
    else:
        data = response.json()
        if data.get("error") == "invalid-request-error" and "rooms reached" in data.get(
            "info", ""
        ):
            logger.error(
                "We are currently at capacity for this demo. Please try again later."
            )
            return {
                "message": "We are currently at capacity for this demo. Please try again later.",
                "status_code": 429,
            }
        logger.error(f"Failed to create room: {response.status_code}")
        return {"message": "There was an error creating your room", "status_code": 500}


def create_token(room_name: str):
    url = "https://api.daily.co/v1/meeting-tokens"
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {os.environ.get('DAILY_TOKEN')}",
    }
    data = {"properties": {"room_name": room_name, "is_owner": True}}

    response = requests.post(url, headers=headers, json=data)
    if response.status_code == 200:
        token_info = response.json()
        return token_info
    else:
        logger.error(f"Failed to create token: {response.status_code}")
        return None

🚀 Running the Agent Locally

To run this locally, add the following code to the bottom of your main.py.

if __name__ == "__main__":
    """Initialize main function by creating room and token"""
    room_info = create_room()
    if "status_code" in room_info and room_info["status_code"] != 200:
        logger.error(f"Failed to create room: {room_info}")
        print(room_info)

    room_url = room_info["url"]
    room_token = room_info["token"]

    asyncio.run(main(room_url=room_url, token=room_token, paypal_access_token=<PAYPAL_ACCESS_TOKEN>

Run python main.py and you’ll receive a Daily call URL — open it in your browser and start talking!

🌍 Deploying on Cerebrium

To run this with zero provisioning and low latency, you can deploy the code as a Cerebrium application run cerebrium deploy

It should return a deployment url that you can then hit in order to join a meeting room. The request you make should look like:

curl --location 'https://api.aws.us-east-1.cerebrium.ai/v4/p-xxxxxx/pipecat-agent/start_bot' \
--header 'Authorization: Bearer <CEREBRIUM_API_TOKEN>' \
--header 'Content-Type: application/json' \
--data '{"paypal_access_token": <PAYPAL_ACCESS_TOKEN>, "paypal_environment": <PAYPAL_ENV>}'

You now have a real-time voice agent that can interact with PayPal’s suite of tools. Feel free to look at the final version of the code here.

In this tutorial, we showed how to build a real-time voice agent that not only understands natural speech but can also take meaningful action using PayPal’s Model Context Protocol (MCP). By combining tools like Pipecat for voice orchestration, Cerebrium for serverless infrastructure, and Daily for audio transport, we created an intelligent assistant capable of generating invoices, managing subscriptions, and more—all through conversation. This is just the beginning of what’s possible when you connect LLMs to real-world APIs in real time. Whether you’re building for customer support, internal ops, or merchant tools, voice-driven automation is now within reach.

© 2025 Cerebrium, Inc.