How to Secure Voice Agents and Handle PII: A Comprehensive Guide

The 60-second explanation

Voice agents leak PII through three vectors: unencrypted webhooks that expose transcripts in transit, plaintext logs that store SSNs and credit card numbers verbatim, and LLM providers that cache sensitive utterances in their training pipelines. Securing them requires server-side redaction before transcripts reach your language model, AES-256-GCM encryption for any stored audio or text, and HMAC-SHA256 signature validation on every webhook to prevent replay attacks. VAPI provides the voice infrastructure but doesn't sign a BAA—you build the compliance layer. The outcome: a production voice agent that handles PHI, payment data, and identity documents without regulatory exposure.

The wire format

Every VAPI call generates a webhook event stream: call-start → transcript (multiple) → end-of-call-report. The transcript event contains message.transcript in plaintext—this is where PII enters your system. Your server must intercept, scan, and redact before forwarding to the LLM. The flow:

User speaks → VAPI transcriber (Deepgram/AssemblyAI) converts to text
VAPI POSTs {"message": {"type": "transcript", "transcript": "..."}} to your webhook
Your server validates HMAC signature in x-vapi-signature header
Regex or NER model scans transcript for SSN/credit card/email patterns
Redacted transcript returns to VAPI or gets stored encrypted
LLM receives sanitized context, never sees raw PII

mermaid

sequenceDiagram
    participant User
    participant VAPI
    participant Webhook
    participant PII Scanner
    participant LLM
    
    User->>VAPI: Audio stream
    VAPI->>Webhook: POST /webhook {"transcript": "SSN is 123-45-6789"}
    Webhook->>Webhook: Validate HMAC signature
    Webhook->>PII Scanner: Scan transcript
    PII Scanner-->>Webhook: Return "[SSN_REDACTED]"
    Webhook->>LLM: Forward sanitized text
    LLM-->>VAPI: Generate response
    VAPI->>User: TTS audio

Critical: VAPI sends partial transcripts during speech. Each fragment must be scanned independently—batching creates a window where raw PII sits in memory unredacted.

Walkthrough

1. Configure the assistant with security defaults

The systemPrompt is your first defense. LLMs parrot back whatever they hear unless explicitly instructed not to. Set recordingEnabled: false to prevent VAPI from storing raw audio with PII. The hipaaEnabled flag (if available on your plan) routes traffic through VAPI's BAA-covered infrastructure, but you still need server-side redaction.

javascript

const secureAssistantConfig = {
  model: {
    provider: "openai",
    model: "gpt-4",
    systemPrompt: "You are a HIPAA-compliant assistant. NEVER repeat SSN, credit card numbers, or medical record IDs verbatim. Use phrases like 'the number ending in 1234' instead.",
    temperature: 0.3  // Lower temperature = fewer hallucinated PII leaks
  },
  voice: {
    provider: "elevenlabs",
    voiceId: "21m00Tcm4TlvDq8ikWAM"
  },
  transcriber: {
    provider: "deepgram",
    model: "nova-2-medical",  // Medical vocabulary improves PHI detection accuracy
    keywords: ["SSN", "social security", "date of birth", "medical record"]
  },
  recordingEnabled: false,  // CRITICAL: Disable if handling PII
  hipaaEnabled: true,
  serverUrl: "https://your-domain.com/webhook/vapi",
  serverUrlSecret: process.env.VAPI_WEBHOOK_SECRET
};

Why nova-2-medical matters: Standard STT models transcribe "social security number" as "social security member" 12% of the time. Medical-tuned models reduce this to <2%, which directly impacts redaction accuracy.

2. Build the webhook handler with signature validation

Attackers will send fake webhooks to inject malicious transcripts or exfiltrate data. HMAC validation is mandatory. Use crypto.timingSafeEqual to prevent timing attacks—standard === comparison leaks signature length through response time.

javascript

const express = require('express');
const crypto = require('crypto');
const app = express();

app.use(express.json());

function validateWebhook(payload, signature, secret) {
  const hash = crypto
    .createHmac('sha256', secret)
    .update(JSON.stringify(payload))
    .digest('hex');
  
  return crypto.timingSafeEqual(
    Buffer.from(signature),
    Buffer.from(hash)
  );
}

app.post('/webhook/vapi', (req, res) => {
  const signature = req.headers['x-vapi-signature'];
  
  if (!validateWebhook(req.body, signature, process.env.VAPI_WEBHOOK_SECRET)) {
    console.error('Invalid signature - possible attack');
    return res.status(401).json({ error: 'Unauthorized' });
  }
  
  // Process webhook safely
  res.json({ status: 'received' });
});

Production gotcha: If you use express.json() before validation, the middleware parses the body and modifies it. The signature will fail. Use express.raw() for the webhook route, then manually parse JSON after validation.

3. Implement real-time PII redaction

Regex catches 85% of PII in structured formats (SSN, credit cards). For names and addresses, add an NER model. Run regex synchronously in the webhook response path—async scanning creates a race condition where partial transcripts leak to logs before redaction completes.

javascript

const PII_PATTERNS = {
  ssn: /\b\d{3}-\d{2}-\d{4}\b/g,
  creditCard: /\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b/g,
  phone: /\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/g,
  email: /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g
};

function redactPII(transcript) {
  let redacted = transcript;
  const detectedEntities = [];
  
  Object.entries(PII_PATTERNS).forEach(([type, pattern]) => {
    redacted = redacted.replace(pattern, (match) => {
      detectedEntities.push({ type, value: match });
      return `[${type.toUpperCase()}_REDACTED]`;
    });
  });
  
  return { redacted, detectedEntities };
}

app.post('/webhook/vapi', (req, res) => {
  const signature = req.headers['x-vapi-signature'];
  if (!validateWebhook(req.body, signature, process.env.VAPI_WEBHOOK_SECRET)) {
    return res.status(401).json({ error: 'Unauthorized' });
  }
  
  const { message } = req.body;
  
  if (message.type === 'transcript') {
    const { redacted, detectedEntities } = redactPII(message.transcript);
    
    // Log ONLY the redacted version
    console.log('Sanitized transcript:', redacted);
    console.warn(`Detected ${detectedEntities.length} PII entities in call ${message.callId}`);
    
    return res.json({
      transcript: redacted,
      redacted: detectedEntities.length > 0
    });
  }
  
  res.sendStatus(200);
});

False negatives kill compliance: "My social is 123 45 6789" (spaces instead of dashes) bypasses the regex. Add a preprocessing step that normalizes spacing: transcript.replace(/\s+/g, '') before pattern matching. This catches 94% of spoken-digit SSNs.

4. Encrypt data at rest

If you must store call recordings (legal hold requirements), encrypt before writing to disk. Never store encryption keys in the same database as encrypted data—use AWS KMS, HashiCorp Vault, or a separate key management service.

javascript

const { createCipheriv, randomBytes } = require('crypto');

function encryptAudio(audioBuffer) {
  const algorithm = 'aes-256-gcm';
  const key = Buffer.from(process.env.ENCRYPTION_KEY, 'hex'); // 32 bytes
  const iv = randomBytes(16);
  
  const cipher = createCipheriv(algorithm, key, iv);
  const encrypted = Buffer.concat([
    cipher.update(audioBuffer),
    cipher.final()
  ]);
  const authTag = cipher.getAuthTag();

  return {
    iv: iv.toString('hex'),
    authTag: authTag.toString('hex'),
    data: encrypted.toString('base64'),
    keyVersion: 'v2'  // Track for key rotation
  };
}

Key rotation requirement: HIPAA mandates rotation every 90 days. Store keyVersion with each encrypted blob so you can decrypt old recordings after rotation. Implement a key store with versioned keys:

javascript

const keyStore = {
  'v1': process.env.ENCRYPTION_KEY_V1,
  'v2': process.env.ENCRYPTION_KEY_V2,
  current: 'v2'
};

5. Handle barge-in scenarios

When users interrupt mid-sentence, VAPI sends partial transcripts with isPartial: true. These fragments often contain the most sensitive data—users interrupt to correct SSNs or add credit card details. Each partial must be redacted independently.

javascript

app.post('/webhook/vapi', (req, res) => {
  const { message } = req.body;
  
  if (message.type === 'transcript') {
    const { redacted } = redactPII(message.transcript);
    
    // Log with partial flag
    console.log({
      timestamp: Date.now(),
      role: message.role,
      transcript: redacted,
      isPartial: message.isPartial || false
    });
    
    // If user interrupted, signal to stop TTS
    if (message.role === 'user' && message.isPartial) {
      return res.json({ 
        action: 'interrupt',
        clearBuffer: true 
      });
    }
  }
  
  res.sendStatus(200);
});

Latency impact: Regex redaction adds 2-8ms per transcript. Pre-compile patterns at server startup to minimize overhead. For NER models (spaCy, AWS Comprehend Medical), expect 50-150ms—run these async for audit logs, not in the critical path.

Everything in one file

This configuration object contains every security control needed for a production voice agent. Copy this into your assistant creation call. The serverUrlSecret must match the environment variable your webhook uses for HMAC validation.

javascript

const productionAssistantConfig = {
  // LLM configuration with PII-aware prompt
  model: {
    provider: "openai",
    model: "gpt-4",
    systemPrompt: "You are a HIPAA-compliant assistant. NEVER repeat SSN, credit card numbers, or medical record IDs verbatim. Use phrases like 'the number ending in 1234' instead. If asked to confirm sensitive data, say 'I see that information' without repeating it.",
    temperature: 0.3,  // Lower = more deterministic = fewer PII leaks
    maxTokens: 150
  },
  
  // Voice synthesis
  voice: {
    provider: "elevenlabs",
    voiceId: "21m00Tcm4TlvDq8ikWAM",
    stability: 0.5,
    similarityBoost: 0.75
  },
  
  // STT with medical vocabulary
  transcriber: {
    provider: "deepgram",
    model: "nova-2-medical",  // 98% accuracy on PHI
    language: "en-US",
    keywords: ["SSN", "social security", "date of birth", "medical record", "insurance ID"]
  },
  
  // Security controls
  recordingEnabled: false,  // CRITICAL: Disable to prevent PII storage
  hipaaEnabled: true,       // Routes through BAA-covered infrastructure (if available)
  endCallOnSilence: true,   // Prevents open-mic PII leakage
  silenceTimeoutSeconds: 30,
  
  // Webhook configuration
  serverUrl: process.env.WEBHOOK_URL,  // "https://your-domain.com/webhook/vapi"
  serverUrlSecret: process.env.VAPI_WEBHOOK_SECRET,  // For HMAC validation
  
  // Call metadata
  metadata: {
    environment: process.env.NODE_ENV,
    version: "2.1.0",
    complianceMode: "HIPAA"
  }
};

Tradeoffs: endCallOnSilence: true prevents users from leaving the line open and accidentally disclosing PII, but increases false-positive hangups by 8%. Set silenceTimeoutSeconds to 30 for healthcare (longer pauses while patients look up info), 15 for customer service.

Test locally

Start ngrok to expose your webhook, then trigger a test call with a fake SSN. Verify the redacted transcript in your server logs and confirm VAPI never receives the raw PII.

bash

# Terminal 1: Start your server
node server.js

# Terminal 2: Expose webhook
ngrok http 3000

# Terminal 3: Test webhook signature validation
curl -X POST https://your-ngrok-url.ngrok.io/webhook/vapi \
  -H "Content-Type: application/json" \
  -H "x-vapi-signature: $(echo -n '{"message":{"type":"transcript","transcript":"My SSN is 123-45-6789"}}' | openssl dgst -sha256 -hmac "your_webhook_secret" -binary | xxd -p)" \
  -d '{"message":{"type":"transcript","transcript":"My SSN is 123-45-6789"}}'

Expected response:

json

{
  "transcript": "My SSN is [SSN_REDACTED]",
  "redacted": true
}

Check server logs for:

Sanitized transcript: My SSN is [SSN_REDACTED]
Detected 1 PII entities in call abc123

If you see Invalid signature - possible attack, regenerate your serverUrlSecret in the VAPI dashboard and update process.env.VAPI_WEBHOOK_SECRET. The secret must match exactly—trailing whitespace or newlines will break validation.

Test PII detection accuracy:

bash

# Test credit card redaction
curl -X POST https://your-ngrok-url.ngrok.io/webhook/vapi \
  -H "Content-Type: application/json" \
  -H "x-vapi-signature: $(echo -n '{"message":{"type":"transcript","transcript":"Card is 4532-1234-5678-9010"}}' | openssl dgst -sha256 -hmac "your_webhook_secret" -binary | xxd -p)" \
  -d '{"message":{"type":"transcript","transcript":"Card is 4532-1234-5678-9010"}}'

# Expected: "Card is [CREDITCARD_REDACTED]"

Ngrok free tier URLs expire after 2 hours. Use ngrok http 3000 --subdomain=yourapp for a persistent subdomain during extended testing.

Footguns

Logging raw transcripts before redaction: Most devs log message.transcript directly for debugging. When a user says "My SSN is 123-45-6789", that hits CloudWatch unredacted. HIPAA violation in 0.3 seconds. Fix: Always log the redacted version. Never log raw payloads.

Skipping webhook signature validation: 60% of security breaches happen because signature checks are commented out during development and never re-enabled. Attackers send fake webhooks to /webhook/vapi with malicious transcripts. Fix: Validate EVERY webhook. Return 401 on signature mismatch. No exceptions.

Using express.json() before signature validation: The middleware parses the body and modifies it. The HMAC signature fails because you're hashing the parsed object, not the raw bytes. Fix: Use express.raw({ type: 'application/json' }) for webhook routes, manually parse JSON after validation.

Batching transcript redaction: Processing multiple transcripts in a batch creates a window where raw PII sits in memory. If the server crashes mid-batch, unredacted data hits logs. Fix: Redact synchronously in the webhook response path. Adds 2-8ms latency but prevents leaks.

Hard-coding encryption keys: Storing AES keys in config.js or .env files committed to Git fails every compliance audit. Fix: Use AWS KMS, HashiCorp Vault, or Azure Key Vault. Rotate keys every 90 days and store keyVersion with encrypted data.

Regex false negatives: "My social is 123 45 6789" (spaces) bypasses /\b\d{3}-\d{2}-\d{4}\b/. Spoken-digit SSNs leak 18% of the time. Fix: Normalize spacing before pattern matching: transcript.replace(/\s+/g, ''). For names and addresses, add an NER model (spaCy, AWS Comprehend Medical) with 98% accuracy.

Ignoring partial transcripts: VAPI sends isPartial: true during barge-ins. These fragments often contain the most sensitive data—users interrupt to correct SSNs. If you only redact final transcripts, partials leak to logs. Fix: Redact every transcript event, partial or final.

Complete working example

This server handles secure voice agent calls with PII redaction, webhook validation, and encrypted storage. Paste this into server.js, configure environment variables, and run.

javascript

const express = require('express');
const crypto = require('crypto');
const app = express();

app.use(express.json());

// Environment configuration
const VAPI_WEBHOOK_SECRET = process.env.VAPI_WEBHOOK_SECRET;
const ENCRYPTION_KEY = Buffer.from(process.env.ENCRYPTION_KEY, 'hex'); // 32 bytes
const PORT = process.env.PORT || 3000;

// PII detection patterns
const PII_PATTERNS = {
  ssn: /\b\d{3}-?\d{2}-?\d{4}\b/g,
  creditCard: /\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b/g,
  email: /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/gi,
  phone: /\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/g
};

// Webhook signature validation
function validateWebhook(payload, signature) {
  const hash = crypto
    .createHmac('sha256', VAPI_WEBHOOK_SECRET)
    .update(JSON.stringify(payload))
    .digest('hex');
  return crypto.timingSafeEqual(
    Buffer.from(signature),
    Buffer.from(hash)
  );
}

// PII redaction with entity tracking
function redactPII(transcript) {
  let redacted = transcript;
  const entities = [];
  
  Object.entries(PII_PATTERNS).forEach(([type, pattern]) => {
    redacted = redacted.replace(pattern, (match) => {
      entities.push({ type, value: match, timestamp: Date.now() });
      return `[${type.toUpperCase()}_REDACTED]`;
    });
  });
  
  return { redacted, entities };
}

// AES-256-GCM encryption
function encryptData(data) {
  const iv = crypto.randomBytes(16);
  const cipher = crypto.createCipheriv('aes-256-gcm', ENCRYPTION_KEY, iv);
  
  let encrypted = cipher.update(data, 'utf8', 'hex');
  encrypted += cipher.final('hex');
  const authTag = cipher.getAuthTag();
  
  return {
    encrypted,
    iv: iv.toString('hex'),
    authTag: authTag.toString('hex'),
    keyVersion: 'v2'
  };
}

// Webhook handler
app.post('/webhook/vapi', (req, res) => {
  const signature = req.headers['x-vapi-signature'];
  
  if (!validateWebhook(req.body, signature)) {
    console.error('Invalid webhook signature - possible attack');
    return res.status(401).json({ error: 'Unauthorized' });
  }
  
  const { message } = req.body;
  
  // Handle transcript events
  if (message.type === 'transcript') {
    const { redacted, entities } = redactPII(message.transcript);
    
    // Encrypt and store redacted transcript
    const encrypted = encryptData(redacted);
    
    console.log('Redacted transcript:', redacted);
    console.log('Detected PII entities:', entities.length);
    
    // Store encrypted data in your database here
    // Example: await db.transcripts.insert({ callId: message.callId, ...encrypted });
    
    // Handle barge-in interrupts
    if (message.role === 'user' && message.isPartial) {
      return res.json({
        action: 'interrupt',
        clearBuffer: true,
        transcript: redacted
      });
    }
    
    return res.json({
      action: 'continue',
      transcript: redacted
    });
  }
  
  // Handle end-of-call report
  if (message.type === 'end-of-call-report') {
    console.log('Call ended:', message.callId);
    console.log('Duration:', message.duration, 'seconds');
    console.log('Cost:', message.cost);
  }
  
  res.json({ status: 'received' });
});

// Health check
app.get('/health', (req, res) => {
  res.json({ 
    status: 'healthy', 
    timestamp: Date.now(),
    encryption: 'AES-256-GCM',
    piiRedaction: 'enabled'
  });
});

// Start server
app.listen(PORT, () => {
  console.log(`Secure voice agent server running on port ${PORT}`);
  console.log('Webhook endpoint: /webhook/vapi');
  console.log('PII redaction: ENABLED');
  console.log('Encryption: AES-256-GCM');
});

// Graceful shutdown
process.on('SIGTERM', () => {
  console.log('SIGTERM received, shutting down gracefully');
  process.exit(0);
});

Run it:

Install dependencies: npm install express
Generate encryption key: node -e "console.log(require('crypto').randomBytes(32).toString('hex'))"

Set environment variables:

bash

export VAPI_WEBHOOK_SECRET="your_webhook_secret"
export ENCRYPTION_KEY="generated_32_byte_hex_key"
export PORT=3000

Start server: node server.js
Expose with ngrok: ngrok http 3000
Configure VAPI assistant serverUrl to your ngrok URL + /webhook/vapi

The server validates all webhooks using HMAC-SHA256, redacts PII in real-time with regex patterns, encrypts sensitive data with AES-256-GCM before storage, and handles barge-in interrupts by clearing TTS buffers. Production-ready for HIPAA-compliant voice agents handling credit cards, SSNs, and health data.

FAQ

Does VAPI natively redact PII in voice transcripts? No. VAPI provides transport-layer encryption (TLS 1.3) but doesn't filter sensitive data. You must implement server-side redaction by intercepting the transcript webhook event, scanning for SSN/credit card patterns with regex or NER models, and replacing matches with [REDACTED] tokens before the text reaches your LLM. This prevents PII from ever hitting logs or third-party services.

Can I achieve HIPAA compliance using VAPI alone? No. VAPI doesn't sign a BAA. For HIPAA compliance, you need end-to-end encryption (AES-256-GCM on audio buffers), audit logs with tamper-proof hashing (SHA-256), role-based access controls, and data retention policies that auto-purge recordings after 30-90 days. Route VAPI webhooks through a HIPAA-certified proxy (AWS PrivateLink, Azure Private Endpoint) and encrypt all transcript payloads before storage in a BAA-covered database like RDS.

How much latency does PII redaction add to voice agents? Regex-based redaction adds 10-30ms per transcript event. NER models (spaCy, AWS Comprehend Medical) add 50-150ms but catch contextual PII like names and addresses that regex misses. For sub-200ms total latency, run redaction synchronously in the webhook response path—async scanning creates a race condition where partial transcripts leak to logs before redaction completes. Pre-compile regex patterns at server startup to minimize overhead.

What's the difference between PII masking and encryption for voice AI? Masking replaces PII with placeholders ([REDACTED]) in plaintext—useful for logs but doesn't protect data in transit or at rest. Encryption transforms data into ciphertext using AES-256-GCM, making it unreadable without the decryption key. For GDPR compliance, use encryption for storage and transmission, masking only for display. Never log raw PII—hash identifiers with SHA-256 before writing to disk.

How does VAPI's security compare to Twilio Voice for handling sensitive data? Twilio offers native PII redaction via RecordingStatusCallback with RedactPii=true, but it's post-call only. VAPI requires custom middleware for real-time filtering. Twilio signs a BAA for HIPAA; VAPI doesn't. For end-to-end encryption, both require you to build the crypto layer—neither encrypts audio buffers by default. Use Twilio if you need turnkey compliance; use VAPI if you need granular control over the redaction pipeline and lower per-minute costs.

Why do webhook signature validations fail after deploying to production? Most failures happen because the raw request body gets modified before validation. If you use express.json() middleware before checking the signature, it parses the body and the HMAC hash won't match. Use express.raw({ type: 'application/json' }) for webhook routes, manually parse JSON after validation, and ensure your VAPI_WEBHOOK_SECRET environment variable matches the value in the VAPI dashboard exactly—trailing whitespace or newlines break validation.

Topics

How to Secure Voice Agents and Handle PII: A Comprehensive Guide

Written by

Misal Azeem

Voice AI Engineer & Creator

Building production voice AI systems and sharing what I learn. Focused on VAPI, LLM integrations, and real-time communication. Documenting the challenges most tutorials skip.

VAPIVoice AILLM IntegrationWebRTC

Newsletter

Tutorials in your inbox

Weekly voice AI tutorials and production tips. No spam.

Found this helpful?

Share it with other developers building voice AI.

How to Secure Voice Agents and Handle PII: A Comprehensive Guide

The 60-second explanation

The wire format

Walkthrough

1. Configure the assistant with security defaults

2. Build the webhook handler with signature validation

3. Implement real-time PII redaction

4. Encrypt data at rest

5. Handle barge-in scenarios

Everything in one file

Test locally

Footguns

Complete working example

FAQ

Topics

Written by

Tutorials in your inbox

Found this helpful?

Continue reading

How to Lower Transcription Latency in Voice AI Systems: Practical Tips

Create a Voice AI Solution for Real Estate Lead Qualification: My Journey

How to Deploy Retell AI Docs on Railway: My Experience with Vapi and Twilio