Integrate Seamlessly: Leveraging APIs for Voice-to-Chat Handoffs with Twilio & VAPI

TL;DR

Voice-to-chat handoffs break when session context gets lost between platforms. Build a stateful bridge using VAPI's function calling to trigger Twilio SMS/WhatsApp, passing conversation history and user metadata. Use webhook signatures for security, implement idempotent handoff logic to prevent duplicate messages, and store session state in Redis with 24-hour TTL. Result: users switch channels mid-conversation without repeating themselves.

Prerequisites

API Keys & Credentials

You need active accounts with VAPI (for voice agent orchestration) and Twilio (for SMS/chat routing). Generate your VAPI API key from the dashboard and your Twilio Account SID + Auth Token from the Twilio Console. Store these in a .env file—never hardcode credentials.

Runtime & Dependencies

Node.js 16+ with npm or yarn. Install axios (HTTP client) and dotenv (environment variable management). You'll also need a Twilio phone number provisioned and ready for inbound/outbound calls.

Server Infrastructure

A publicly accessible server (localhost won't work). Use ngrok for local development to expose your webhook endpoints, or deploy to Heroku/AWS. VAPI and Twilio must reach your server via HTTPS with valid SSL certificates.

Knowledge Requirements

Familiarity with REST APIs, async/await patterns, and webhook handling. Understanding of session state management and basic authentication (Bearer tokens, HMAC signature validation) is essential for secure handoffs.

VAPI: Get Started with VAPI → Get VAPI

Step-by-Step Tutorial

Configuration & Setup

Most voice-to-chat handoffs fail because developers treat VAPI and Twilio as a unified system. They're not. VAPI handles voice intelligence. Twilio routes calls and manages SMS. Your server bridges them.

Server Requirements:

javascript

// Express server with webhook validation
const express = require('express');
const crypto = require('crypto');
const twilio = require('twilio');

const app = express();
app.use(express.json());

const VAPI_API_KEY = process.env.VAPI_API_KEY;
const TWILIO_ACCOUNT_SID = process.env.TWILIO_ACCOUNT_SID;
const TWILIO_AUTH_TOKEN = process.env.TWILIO_AUTH_TOKEN;
const TWILIO_PHONE = process.env.TWILIO_PHONE_NUMBER;

// Webhook signature validation (REQUIRED in production)
function validateVapiWebhook(req) {
  const signature = req.headers['x-vapi-signature'];
  const payload = JSON.stringify(req.body);
  const hash = crypto.createHmac('sha256', process.env.VAPI_WEBHOOK_SECRET)
    .update(payload).digest('hex');
  return signature === hash;
}

VAPI Assistant Config:

javascript

const assistantConfig = {
  model: {
    provider: "openai",
    model: "gpt-4",
    messages: [{
      role: "system",
      content: "You are a support agent. If the user requests chat support, say 'Transferring you to chat now' and end the call."
    }]
  },
  voice: {
    provider: "11labs",
    voiceId: "21m00Tcm4TlvDq8ikWAM"
  },
  transcriber: {
    provider: "deepgram",
    model: "nova-2",
    language: "en"
  },
  endCallFunctionEnabled: true,
  serverUrl: "https://your-domain.ngrok.io/webhook/vapi", // YOUR server receives webhooks here
  serverUrlSecret: process.env.VAPI_WEBHOOK_SECRET
};

Architecture & Flow

mermaid

flowchart LR
    A[User on Call] --> B[VAPI Voice Agent]
    B --> C{Handoff Trigger?}
    C -->|Yes| D[Your Server /webhook/vapi]
    D --> E[Twilio SMS API]
    E --> F[User Receives Chat Link]
    C -->|No| B
    D --> G[End VAPI Call]

Critical Separation: VAPI manages the voice session. Twilio sends the SMS. Your server detects the handoff intent and orchestrates both.

Step-by-Step Implementation

1. Detect Handoff Intent

javascript

// YOUR server's webhook endpoint (not a VAPI API endpoint)
app.post('/webhook/vapi', async (req, res) => {
  if (!validateVapiWebhook(req)) {
    return res.status(401).json({ error: 'Invalid signature' });
  }

  const { message } = req.body;
  
  // Check if assistant triggered handoff
  if (message.type === 'end-of-call-report') {
    const transcript = message.transcript || '';
    const userPhone = message.call?.customer?.number;
    
    // Intent detection: look for handoff keywords in final transcript
    const handoffKeywords = ['chat', 'text', 'message', 'transfer'];
    const triggeredHandoff = handoffKeywords.some(kw => 
      transcript.toLowerCase().includes(kw)
    );
    
    if (triggeredHandoff && userPhone) {
      await initiateTextHandoff(userPhone, message.call.id);
    }
  }
  
  res.status(200).json({ received: true });
});

2. Send Chat Link via Twilio

javascript

async function initiateTextHandoff(customerPhone, callId) {
  const twilioClient = twilio(TWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKEN);
  
  try {
    const chatUrl = `https://your-chat-app.com/session/${callId}`;
    
    await twilioClient.messages.create({
      body: `Your support session is ready. Continue here: ${chatUrl}`,
      from: TWILIO_PHONE,
      to: customerPhone
    });
    
    console.log(`Chat handoff sent to ${customerPhone}`);
  } catch (error) {
    console.error('Twilio SMS Error:', error.code, error.message);
    // Log to monitoring system - handoff failed
  }
}

Error Handling & Edge Cases

Race Condition: VAPI sends end-of-call-report AFTER the call ends. If you try to send function responses, they'll fail. Solution: Use the end-of-call webhook for handoff triggers, not mid-call function calls.

Missing Phone Number: message.call?.customer?.number can be undefined for web calls. Always validate:

javascript

if (!userPhone || !userPhone.startsWith('+')) {
  console.error('Invalid phone number for handoff');
  return; // Cannot send SMS without valid E.164 number
}

Twilio Rate Limits: Sending 100+ SMS/sec hits carrier filters. Implement queue with 10 msg/sec limit for production.

Testing & Validation

Trigger handoff: Call your VAPI number, say "I need chat support"
Check webhook logs: Verify end-of-call-report received with transcript
Confirm SMS delivery: User should receive chat link within 2-3 seconds
Test failure modes: Try invalid phone numbers, expired Twilio credentials

Production Checklist:

Webhook signature validation enabled
Twilio error codes logged (30008 = unknown destination)
Chat session expires after 24 hours
Fallback if SMS fails: email or in-app notification

System Diagram

Audio processing pipeline from microphone input to speaker output.

mermaid

graph LR
    A[Microphone] --> B[Audio Buffer]
    B --> C[Voice Activity Detection]
    C --> D{Is Voice Detected?}
    D -- Yes --> E[Speech-to-Text]
    D -- No --> F[Error Handling]
    E --> G[Intent Detection]
    G --> H[Response Generation]
    H --> I[Text-to-Speech]
    I --> J[Speaker]
    F --> K[Log Error]
    K --> L[Retry Mechanism]
    L --> C

Testing & Validation

Most voice-to-chat handoffs fail in production because developers skip local webhook testing. Here's how to validate before deployment.

Local Testing with Vapi CLI

The Vapi CLI webhook forwarder eliminates ngrok complexity. Install and run:

bash

npm install -g @vapi-ai/cli
vapi webhook forward http://localhost:3000/webhook/vapi

This creates a public tunnel to your local server. The CLI outputs a forwarding URL—use this as your serverUrl in assistantConfig. Start your Express server, then trigger a test call. Watch your terminal for incoming webhook payloads. If you see assistant-request events with the correct transcript, your validation logic works.

Webhook Signature Validation

Test signature verification with a manual curl request:

bash

# Generate test signature
echo -n '{"message":{"role":"assistant","content":"I need help"}}' | \
openssl dgst -sha256 -hmac "your_server_secret" | \
awk '{print $2}'

# Send test webhook
curl -X POST http://localhost:3000/webhook/vapi \
  -H "Content-Type: application/json" \
  -H "x-vapi-signature: <generated_signature>" \
  -d '{"message":{"role":"assistant","content":"I need help"}}'

If validateVapiWebhook returns false, your serverUrlSecret doesn't match. Check for trailing whitespace in environment variables. Valid signatures return 200; invalid ones return 401. Test handoff keywords by sending payloads with transcript containing handoffKeywords values. Verify initiateTextHandoff fires and Twilio receives the chatUrl.

Real-World Example

Barge-In Scenario

User calls in to check order status. Agent starts reading a long order history. User interrupts mid-sentence: "Just tell me if it shipped." The system must:

Detect the interrupt via STT partial transcripts
Cancel the TTS stream (stop the agent mid-word)
Process the new intent without repeating the interrupted content
Trigger handoff if the query requires human intervention

Here's what breaks in production: Most implementations queue the interrupt AFTER the current TTS finishes. Result? Agent talks for 3-5 more seconds while user repeats themselves. This creates the "talking over each other" failure mode.

javascript

// Barge-in handler with TTS cancellation
let isAgentSpeaking = false;
let currentTTSStream = null;

app.post('/webhook/vapi', async (req, res) => {
  const payload = req.body;
  
  if (payload.message?.type === 'transcript' && payload.message.transcriptType === 'partial') {
    const transcript = payload.message.transcript.toLowerCase();
    
    // Detect interrupt during agent speech
    if (isAgentSpeaking && transcript.length > 5) {
      console.log(`[INTERRUPT] User spoke during agent turn: "${transcript}"`);
      
      // Cancel current TTS immediately
      if (currentTTSStream) {
        currentTTSStream.abort();
        currentTTSStream = null;
      }
      isAgentSpeaking = false;
      
      // Check for handoff keywords
      const handoffKeywords = ['human', 'agent', 'representative', 'help'];
      const triggeredHandoff = handoffKeywords.some(kw => transcript.includes(kw));
      
      if (triggeredHandoff) {
        await initiateTextHandoff(payload.call.customer.number, transcript);
        return res.json({ 
          action: 'end-call',
          message: 'Transferring you to SMS support now.'
        });
      }
    }
  }
  
  if (payload.message?.type === 'speech-start') {
    isAgentSpeaking = true;
  }
  
  if (payload.message?.type === 'speech-end') {
    isAgentSpeaking = false;
  }
  
  res.sendStatus(200);
});

Event Logs

Real production logs from a handoff scenario (timestamps in ms):

[14:32:01.234] speech-start: Agent begins TTS
[14:32:02.891] transcript (partial): "just" 
[14:32:03.102] INTERRUPT DETECTED - Cancelling TTS
[14:32:03.156] speech-end: Agent stopped
[14:32:03.401] transcript (final): "just connect me to someone"
[14:32:03.523] Handoff keyword match: "someone"
[14:32:03.687] Twilio SMS initiated: +1234567890
[14:32:03.891] action: end-call

Critical timing: Interrupt detection to TTS cancellation = 54ms. If this exceeds 200ms, users hear overlapping audio. The isAgentSpeaking flag prevents false positives from background noise.

Edge Cases

Multiple rapid interrupts: User says "wait no actually yes transfer me." Without debouncing, this triggers 3 separate handoff attempts. Solution: 500ms debounce window on handoff actions.

False positive from breathing: VAD fires on heavy breathing during agent speech. Mitigation: Require transcript.length > 5 before processing interrupts (filters out gasps, coughs).

Network jitter: Partial transcripts arrive out-of-order on mobile networks. The transcriptType === 'partial' check ensures we only act on real-time partials, not delayed finals. Session state must track the last processed timestamp to discard stale events.

Common Issues & Fixes

Race Conditions During Handoff

The most common production failure: VAPI continues streaming TTS while your handoff logic fires the Twilio SMS. Result: user receives the chat link mid-sentence, then hears "I've sent you a chat link" 800ms later. This happens because assistant-request events don't cancel active TTS streams.

javascript

// WRONG: No cancellation guard
app.post('/webhook/vapi', async (req, res) => {
  const payload = req.body;
  if (payload.message?.role === 'assistant') {
    const transcript = payload.message.content.toLowerCase();
    if (transcript.includes('chat') && !isAgentSpeaking) {
      await initiateTextHandoff(payload.call.customer.number);
    }
  }
});

// CORRECT: Cancel TTS before handoff
let currentTTSStream = null;
app.post('/webhook/vapi', async (req, res) => {
  const payload = req.body;
  
  if (payload.message?.role === 'assistant') {
    const transcript = payload.message.content.toLowerCase();
    const handoffKeywords = ['chat', 'text me', 'send link'];
    const triggeredHandoff = handoffKeywords.some(kw => transcript.includes(kw));
    
    if (triggeredHandoff) {
      // Cancel active TTS immediately
      if (currentTTSStream) {
        currentTTSStream.abort();
        currentTTSStream = null;
      }
      
      await initiateTextHandoff(payload.call.customer.number);
      return res.json({ action: 'end-call' }); // Terminate to prevent overlap
    }
  }
  
  res.sendStatus(200);
});

Fix: Track TTS state with currentTTSStream and abort before sending SMS. Add 200ms delay if you need the assistant to finish the sentence: setTimeout(() => initiateTextHandoff(), 200).

Webhook Signature Failures

Twilio webhooks fail validation when your server's clock drifts >5 minutes or you're behind a proxy that modifies headers. Error: X-Twilio-Signature mismatch returns 403.

javascript

const crypto = require('crypto');

function validateVapiWebhook(req) {
  const signature = req.headers['x-vapi-signature'];
  const payload = JSON.stringify(req.body);
  const hash = crypto.createHmac('sha256', process.env.VAPI_SECRET)
    .update(payload)
    .digest('hex');
  
  if (signature !== hash) {
    throw new Error('Invalid webhook signature');
  }
}

Fix: Use NTP to sync server time. If behind nginx, preserve original headers: proxy_set_header X-Twilio-Signature $http_x_twilio_signature;

SMS Delivery Lag

Twilio SMS can take 2-8 seconds to deliver. Users hang up before receiving the chatUrl, then complain the link never arrived. This breaks the handoff flow.

Fix: Keep the call active for 10 seconds after sending SMS: setTimeout(() => res.json({ action: 'end-call' }), 10000). Or use Twilio's status callbacks to confirm delivery before ending the VAPI call.

Complete Working Example

This is the full production server that handles voice-to-chat handoffs. Copy this entire file, add your credentials, and run it. The code includes webhook validation, real-time transcript monitoring, and Twilio SMS handoff with conversation context.

javascript

// server.js - Production Voice-to-Chat Handoff Server
const express = require('express');
const crypto = require('crypto');
const twilio = require('twilio');

const app = express();
app.use(express.json());

// Environment variables (set these in .env)
const VAPI_API_KEY = process.env.VAPI_API_KEY;
const VAPI_SERVER_SECRET = process.env.VAPI_SERVER_SECRET;
const TWILIO_ACCOUNT_SID = process.env.TWILIO_ACCOUNT_SID;
const TWILIO_AUTH_TOKEN = process.env.TWILIO_AUTH_TOKEN;
const TWILIO_PHONE = process.env.TWILIO_PHONE_NUMBER;

const twilioClient = twilio(TWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKEN);

// Session state: tracks active calls and their transcripts
const activeSessions = new Map();

// Webhook signature validation (CRITICAL - prevents spoofed requests)
function validateVapiWebhook(req) {
  const signature = req.headers['x-vapi-signature'];
  const payload = JSON.stringify(req.body);
  const hash = crypto
    .createHmac('sha256', VAPI_SERVER_SECRET)
    .update(payload)
    .digest('hex');
  
  if (signature !== hash) {
    throw new Error('Invalid webhook signature');
  }
}

// Handoff trigger logic: monitors transcript for escalation keywords
function checkHandoffTriggers(transcript) {
  const handoffKeywords = [
    'speak to human', 'real person', 'agent', 
    'representative', 'help me', 'frustrated'
  ];
  
  const lowerTranscript = transcript.toLowerCase();
  return handoffKeywords.some(keyword => lowerTranscript.includes(keyword));
}

// Twilio SMS handoff with conversation context
async function initiateTextHandoff(userPhone, conversationHistory) {
  const chatUrl = `https://your-chat-app.com/chat?session=${Date.now()}`;
  
  const contextSummary = conversationHistory
    .slice(-3) // Last 3 turns
    .map(turn => `${turn.role}: ${turn.content}`)
    .join('\n');
  
  const messageBody = `Your call has been transferred to chat support.\n\nContext:\n${contextSummary}\n\nContinue here: ${chatUrl}`;
  
  try {
    const message = await twilioClient.messages.create({
      body: messageBody,
      from: TWILIO_PHONE,
      to: userPhone
    });
    
    return { success: true, messageSid: message.sid, chatUrl };
  } catch (error) {
    console.error('Twilio SMS Error:', error);
    throw new Error(`SMS handoff failed: ${error.message}`);
  }
}

// Main webhook handler: processes all VAPI events
app.post('/webhook/vapi', async (req, res) => {
  try {
    validateVapiWebhook(req);
    
    const { message } = req.body;
    const callId = message.call?.id;
    const userPhone = message.call?.customer?.number;
    
    // Initialize session on call start
    if (message.type === 'call-start') {
      activeSessions.set(callId, {
        transcript: [],
        userPhone,
        handoffTriggered: false
      });
      return res.json({ success: true });
    }
    
    // Monitor transcript for handoff triggers
    if (message.type === 'transcript') {
      const session = activeSessions.get(callId);
      if (!session || session.handoffTriggered) {
        return res.json({ success: true });
      }
      
      const turn = {
        role: message.role, // 'user' or 'assistant'
        content: message.transcript,
        timestamp: message.timestamp
      };
      
      session.transcript.push(turn);
      
      // Check if user requested handoff
      if (message.role === 'user' && checkHandoffTriggers(message.transcript)) {
        session.handoffTriggered = true;
        
        // Send SMS with context
        const handoffResult = await initiateTextHandoff(
          session.userPhone,
          session.transcript
        );
        
        // Notify assistant to end call gracefully
        return res.json({
          action: 'end-call',
          message: `I've sent you a text message to continue this conversation with a live agent. Check your phone for the link.`
        });
      }
    }
    
    // Cleanup on call end
    if (message.type === 'end-of-call-report') {
      activeSessions.delete(callId);
    }
    
    res.json({ success: true });
    
  } catch (error) {
    console.error('Webhook Error:', error);
    res.status(error.message.includes('signature') ? 401 : 500)
       .json({ error: error.message });
  }
});

// Health check endpoint
app.get('/health', (req, res) => {
  res.json({ 
    status: 'healthy',
    activeCalls: activeSessions.size 
  });
});

const PORT = process.env.PORT || 3000;
app.listen(PORT, () => {
  console.log(`Voice-to-Chat Handoff Server running on port ${PORT}`);
  console.log(`Webhook URL: https://your-domain.com/webhook/vapi`);
});

Run Instructions

1. Install dependencies:

bash

npm install express twilio crypto

2. Set environment variables (create .env file):

bash

VAPI_API_KEY=your_vapi_key
VAPI_SERVER_SECRET=your_webhook_secret
TWILIO_ACCOUNT_SID=your_twilio_sid
TWILIO_AUTH_TOKEN=your_twilio_token
TWILIO_PHONE_NUMBER=+1234567890
PORT=3000

3. Expose server with ngrok:

bash

ngrok http 3000
# Copy the HTTPS URL (e.g., https://abc123.ngrok.io)

4. Configure VAPI webhook in Dashboard:

Server URL: https://abc123.ngrok.io/webhook/vapi
Server URL Secret: (same as VAPI_SERVER_SECRET)
Subscribe to events: call-start, transcript, end-of-call-report

5. Start the server:

bash

node server.js

What happens in production: User calls your VAPI assistant → says "I need to speak to a human" → server detects keyword → sends SMS with last 3 conversation turns → user clicks link → continues in chat with full context. The assistant ends the call gracefully after confirming the handoff.

FAQ

Technical Questions

How do I detect when to trigger a voice-to-chat handoff during a call?

Monitor the transcript from VAPI's speech.final webhook event. Use keyword matching against handoffKeywords array (e.g., "speak to agent", "human support"). The checkHandoffTriggers() function evaluates lowerTranscript against these patterns. Alternatively, configure VAPI's function calling to return a handoff action when the assistant determines escalation is needed. This is more reliable than keyword matching because the LLM understands context—a user saying "I need to talk to someone" triggers handoff, but "I'm talking to someone already" doesn't.

What happens to the call state when transitioning from voice to chat?

VAPI maintains the callId and session object throughout the handoff. Before initiating the Twilio chat, capture the contextSummary (last 3-5 turns of conversation) and pass it to the chat agent. Store this in activeSessions[callId] with metadata: { voiceTranscript: transcript, timestamp, userPhone, assistantContext }. The voice call doesn't drop—VAPI continues the connection while you send the chat invite via Twilio SMS/WhatsApp. The agent can review context before responding, reducing repeat explanations.

How do I prevent duplicate messages during handoff?

Set a triggeredHandoff flag immediately when handoff conditions are met. Check this flag before processing subsequent speech.final events: if (triggeredHandoff) return;. This prevents multiple handoff triggers from the same utterance. Also implement a 2-second debounce window—ignore handoff keywords within 2 seconds of the last handoff attempt.

Performance

What's the latency impact of voice-to-chat handoff?

Handoff latency breaks down as: VAPI webhook delivery (50-200ms) + keyword detection (10-50ms) + Twilio SMS send (500-2000ms) + agent notification (variable). Total: 600-2250ms. To optimize, pre-warm Twilio connections and use Twilio's sendAsync() method. Cache twilioClient instance instead of recreating it per handoff.

Does the voice call stay active during chat?

Yes. VAPI's call remains open while the chat session starts. You control the disconnect: either keep voice active as a fallback, or end the call once chat is established. If keeping both active, set a timeout—if no chat response within 30 seconds, resume voice assistant or offer callback.

Platform Comparison

Why use VAPI + Twilio instead of Twilio Studio alone?

Twilio Studio handles call routing and IVR, but lacks conversational AI. VAPI provides LLM-powered voice agents with real-time transcription and function calling. Combining them: VAPI handles the intelligent conversation, Twilio handles the chat channel and SMS delivery. VAPI's serverUrl webhook integrates with your backend; Twilio's webhooks handle messaging. This separation of concerns keeps voice logic (VAPI) and messaging logic (Twilio) independent and scalable.

Can I use VAPI's native chat instead of Twilio?

VAPI doesn't natively support chat channels—it's voice-first. Twilio provides SMS, WhatsApp, and web chat. If you need multi-channel handoff (voice → SMS → WhatsApp), Twilio is required. If you only need voice → web chat, you could use a custom WebSocket server, but Twilio's infrastructure is battle-tested for production scale.

Resources

Twilio: Get Twilio Voice API → https://www.twilio.com/try-twilio

VAPI Documentation

VAPI API Reference – Assistant configuration, function calling, webhook events, call management
VAPI GitHub – SDKs, examples, community integrations

Twilio Documentation

Twilio Programmable Voice – Call control, SIP integration, webhook handling
Twilio Programmable Chat – Message routing, session management, context preservation

Integration Patterns

Webhook signature validation (HMAC-SHA1 for both platforms)
Session state management across voice-to-chat transitions
Function tool definitions for agent handoff triggers

References

https://docs.vapi.ai/quickstart/web
https://docs.vapi.ai/quickstart/phone
https://docs.vapi.ai/chat/quickstart
https://docs.vapi.ai/quickstart/introduction
https://docs.vapi.ai/assistants/quickstart
https://docs.vapi.ai/workflows/quickstart
https://docs.vapi.ai/assistants/structured-outputs-quickstart
https://docs.vapi.ai/tools/custom-tools
https://docs.vapi.ai/server-url/developing-locally

Integrate Seamlessly: Leveraging APIs for Voice-to-Chat Handoffs with Twilio & VAPI

Integrate Seamlessly: Leveraging APIs for Voice-to-Chat Handoffs with Twilio & VAPI

TL;DR

Prerequisites

Step-by-Step Tutorial

Configuration & Setup

Architecture & Flow

Step-by-Step Implementation

1. Detect Handoff Intent

2. Send Chat Link via Twilio

Error Handling & Edge Cases

Testing & Validation

System Diagram

Testing & Validation

Local Testing with Vapi CLI

Webhook Signature Validation

Real-World Example

Barge-In Scenario

Event Logs

Edge Cases

Common Issues & Fixes

Race Conditions During Handoff

Webhook Signature Failures

SMS Delivery Lag

Complete Working Example

Run Instructions

FAQ

Technical Questions

Performance

Platform Comparison

Resources

References

Topics

Written by

Found this helpful?

Continue Reading

How to Deploy VAPI Voice AI Agent for Real Estate Scheduling: A Developer's Journey

Integrating HubSpot with Salesforce Using Webhooks: What I Learned

Building Production-Ready AI Voice Implementations for Scalable Conversations