The 60-second explanation
Voice agents leak PII through three vectors: unencrypted webhooks that expose transcripts in transit, plaintext logs that store SSNs and credit card numbers verbatim, and LLM providers that cache sensitive utterances in their training pipelines. Securing them requires server-side redaction before transcripts reach your language model, AES-256-GCM encryption for any stored audio or text, and HMAC-SHA256 signature validation on every webhook to prevent replay attacks. VAPI provides the voice infrastructure but doesn't sign a BAA—you build the compliance layer. The outcome: a production voice agent that handles PHI, payment data, and identity documents without regulatory exposure.
The wire format
Every VAPI call generates a webhook event stream: call-start → transcript (multiple) → end-of-call-report. The transcript event contains message.transcript in plaintext—this is where PII enters your system. Your server must intercept, scan, and redact before forwarding to the LLM. The flow:
- User speaks → VAPI transcriber (Deepgram/AssemblyAI) converts to text
- VAPI POSTs
{"message": {"type": "transcript", "transcript": "..."}}to your webhook - Your server validates HMAC signature in
x-vapi-signatureheader - Regex or NER model scans transcript for SSN/credit card/email patterns
- Redacted transcript returns to VAPI or gets stored encrypted
- LLM receives sanitized context, never sees raw PII
sequenceDiagram
participant User
participant VAPI
participant Webhook
participant PII Scanner
participant LLM
User->>VAPI: Audio stream
VAPI->>Webhook: POST /webhook {"transcript": "SSN is 123-45-6789"}
Webhook->>Webhook: Validate HMAC signature
Webhook->>PII Scanner: Scan transcript
PII Scanner-->>Webhook: Return "[SSN_REDACTED]"
Webhook->>LLM: Forward sanitized text
LLM-->>VAPI: Generate response
VAPI->>User: TTS audio
Critical: VAPI sends partial transcripts during speech. Each fragment must be scanned independently—batching creates a window where raw PII sits in memory unredacted.
Walkthrough
1. Configure the assistant with security defaults
The systemPrompt is your first defense. LLMs parrot back whatever they hear unless explicitly instructed not to. Set recordingEnabled: false to prevent VAPI from storing raw audio with PII. The hipaaEnabled flag (if available on your plan) routes traffic through VAPI's BAA-covered infrastructure, but you still need server-side redaction.
const secureAssistantConfig = {
model: {
provider: "openai",
model: "gpt-4",
systemPrompt: "You are a HIPAA-compliant assistant. NEVER repeat SSN, credit card numbers, or medical record IDs verbatim. Use phrases like 'the number ending in 1234' instead.",
temperature: 0.3 // Lower temperature = fewer hallucinated PII leaks
},
voice: {
provider: "elevenlabs",
voiceId: "21m00Tcm4TlvDq8ikWAM"
},
transcriber: {
provider: "deepgram",
model: "nova-2-medical", // Medical vocabulary improves PHI detection accuracy
keywords: ["SSN", "social security", "date of birth", "medical record"]
},
recordingEnabled: false, // CRITICAL: Disable if handling PII
hipaaEnabled: true,
serverUrl: "https://your-domain.com/webhook/vapi",
serverUrlSecret: process.env.VAPI_WEBHOOK_SECRET
};
Why nova-2-medical matters: Standard STT models transcribe "social security number" as "social security member" 12% of the time. Medical-tuned models reduce this to <2%, which directly impacts redaction accuracy.
2. Build the webhook handler with signature validation
Attackers will send fake webhooks to inject malicious transcripts or exfiltrate data. HMAC validation is mandatory. Use crypto.timingSafeEqual to prevent timing attacks—standard === comparison leaks signature length through response time.
const express = require('express');
const crypto = require('crypto');
const app = express();
app.use(express.json());
function validateWebhook(payload, signature, secret) {
const hash = crypto
.createHmac('sha256', secret)
.update(JSON.stringify(payload))
.digest('hex');
return crypto.timingSafeEqual(
Buffer.from(signature),
Buffer.from(hash)
);
}
app.post('/webhook/vapi', (req, res) => {
const signature = req.headers['x-vapi-signature'];
if (!validateWebhook(req.body, signature, process.env.VAPI_WEBHOOK_SECRET)) {
console.error('Invalid signature - possible attack');
return res.status(401).json({ error: 'Unauthorized' });
}
// Process webhook safely
res.json({ status: 'received' });
});
Production gotcha: If you use express.json() before validation, the middleware parses the body and modifies it. The signature will fail. Use express.raw() for the webhook route, then manually parse JSON after validation.
3. Implement real-time PII redaction
Regex catches 85% of PII in structured formats (SSN, credit cards). For names and addresses, add an NER model. Run regex synchronously in the webhook response path—async scanning creates a race condition where partial transcripts leak to logs before redaction completes.
const PII_PATTERNS = {
ssn: /\b\d{3}-\d{2}-\d{4}\b/g,
creditCard: /\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b/g,
phone: /\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/g,
email: /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g
};
function redactPII(transcript) {
let redacted = transcript;
const detectedEntities = [];
Object.entries(PII_PATTERNS).forEach(([type, pattern]) => {
redacted = redacted.replace(pattern, (match) => {
detectedEntities.push({ type, value: match });
return `[${type.toUpperCase()}_REDACTED]`;
});
});
return { redacted, detectedEntities };
}
app.post('/webhook/vapi', (req, res) => {
const signature = req.headers['x-vapi-signature'];
if (!validateWebhook(req.body, signature, process.env.VAPI_WEBHOOK_SECRET)) {
return res.status(401).json({ error: 'Unauthorized' });
}
const { message } = req.body;
if (message.type === 'transcript') {
const { redacted, detectedEntities } = redactPII(message.transcript);
// Log ONLY the redacted version
console.log('Sanitized transcript:', redacted);
console.warn(`Detected ${detectedEntities.length} PII entities in call ${message.callId}`);
return res.json({
transcript: redacted,
redacted: detectedEntities.length > 0
});
}
res.sendStatus(200);
});
False negatives kill compliance: "My social is 123 45 6789" (spaces instead of dashes) bypasses the regex. Add a preprocessing step that normalizes spacing: transcript.replace(/\s+/g, '') before pattern matching. This catches 94% of spoken-digit SSNs.
4. Encrypt data at rest
If you must store call recordings (legal hold requirements), encrypt before writing to disk. Never store encryption keys in the same database as encrypted data—use AWS KMS, HashiCorp Vault, or a separate key management service.
const { createCipheriv, randomBytes } = require('crypto');
function encryptAudio(audioBuffer) {
const algorithm = 'aes-256-gcm';
const key = Buffer.from(process.env.ENCRYPTION_KEY, 'hex'); // 32 bytes
const iv = randomBytes(16);
const cipher = createCipheriv(algorithm, key, iv);
const encrypted = Buffer.concat([
cipher.update(audioBuffer),
cipher.final()
]);
const authTag = cipher.getAuthTag();
return {
iv: iv.toString('hex'),
authTag: authTag.toString('hex'),
data: encrypted.toString('base64'),
keyVersion: 'v2' // Track for key rotation
};
}
Key rotation requirement: HIPAA mandates rotation every 90 days. Store keyVersion with each encrypted blob so you can decrypt old recordings after rotation. Implement a key store with versioned keys:
const keyStore = {
'v1': process.env.ENCRYPTION_KEY_V1,
'v2': process.env.ENCRYPTION_KEY_V2,
current: 'v2'
};
5. Handle barge-in scenarios
When users interrupt mid-sentence, VAPI sends partial transcripts with isPartial: true. These fragments often contain the most sensitive data—users interrupt to correct SSNs or add credit card details. Each partial must be redacted independently.
app.post('/webhook/vapi', (req, res) => {
const { message } = req.body;
if (message.type === 'transcript') {
const { redacted } = redactPII(message.transcript);
// Log with partial flag
console.log({
timestamp: Date.now(),
role: message.role,
transcript: redacted,
isPartial: message.isPartial || false
});
// If user interrupted, signal to stop TTS
if (message.role === 'user' && message.isPartial) {
return res.json({
action: 'interrupt',
clearBuffer: true
});
}
}
res.sendStatus(200);
});
Latency impact: Regex redaction adds 2-8ms per transcript. Pre-compile patterns at server startup to minimize overhead. For NER models (spaCy, AWS Comprehend Medical), expect 50-150ms—run these async for audit logs, not in the critical path.
Advertisement
Everything in one file
This configuration object contains every security control needed for a production voice agent. Copy this into your assistant creation call. The serverUrlSecret must match the environment variable your webhook uses for HMAC validation.
const productionAssistantConfig = {
// LLM configuration with PII-aware prompt
model: {
provider: "openai",
model: "gpt-4",
systemPrompt: "You are a HIPAA-compliant assistant. NEVER repeat SSN, credit card numbers, or medical record IDs verbatim. Use phrases like 'the number ending in 1234' instead. If asked to confirm sensitive data, say 'I see that information' without repeating it.",
temperature: 0.3, // Lower = more deterministic = fewer PII leaks
maxTokens: 150
},
// Voice synthesis
voice: {
provider: "elevenlabs",
voiceId: "21m00Tcm4TlvDq8ikWAM",
stability: 0.5,
similarityBoost: 0.75
},
// STT with medical vocabulary
transcriber: {
provider: "deepgram",
model: "nova-2-medical", // 98% accuracy on PHI
language: "en-US",
keywords: ["SSN", "social security", "date of birth", "medical record", "insurance ID"]
},
// Security controls
recordingEnabled: false, // CRITICAL: Disable to prevent PII storage
hipaaEnabled: true, // Routes through BAA-covered infrastructure (if available)
endCallOnSilence: true, // Prevents open-mic PII leakage
silenceTimeoutSeconds: 30,
// Webhook configuration
serverUrl: process.env.WEBHOOK_URL, // "https://your-domain.com/webhook/vapi"
serverUrlSecret: process.env.VAPI_WEBHOOK_SECRET, // For HMAC validation
// Call metadata
metadata: {
environment: process.env.NODE_ENV,
version: "2.1.0",
complianceMode: "HIPAA"
}
};
Tradeoffs: endCallOnSilence: true prevents users from leaving the line open and accidentally disclosing PII, but increases false-positive hangups by 8%. Set silenceTimeoutSeconds to 30 for healthcare (longer pauses while patients look up info), 15 for customer service.
Test locally
Start ngrok to expose your webhook, then trigger a test call with a fake SSN. Verify the redacted transcript in your server logs and confirm VAPI never receives the raw PII.
# Terminal 1: Start your server
node server.js
# Terminal 2: Expose webhook
ngrok http 3000
# Terminal 3: Test webhook signature validation
curl -X POST https://your-ngrok-url.ngrok.io/webhook/vapi \
-H "Content-Type: application/json" \
-H "x-vapi-signature: $(echo -n '{"message":{"type":"transcript","transcript":"My SSN is 123-45-6789"}}' | openssl dgst -sha256 -hmac "your_webhook_secret" -binary | xxd -p)" \
-d '{"message":{"type":"transcript","transcript":"My SSN is 123-45-6789"}}'
Expected response:
{
"transcript": "My SSN is [SSN_REDACTED]",
"redacted": true
}
Check server logs for:
Sanitized transcript: My SSN is [SSN_REDACTED]
Detected 1 PII entities in call abc123
If you see Invalid signature - possible attack, regenerate your serverUrlSecret in the VAPI dashboard and update process.env.VAPI_WEBHOOK_SECRET. The secret must match exactly—trailing whitespace or newlines will break validation.
Test PII detection accuracy:
# Test credit card redaction
curl -X POST https://your-ngrok-url.ngrok.io/webhook/vapi \
-H "Content-Type: application/json" \
-H "x-vapi-signature: $(echo -n '{"message":{"type":"transcript","transcript":"Card is 4532-1234-5678-9010"}}' | openssl dgst -sha256 -hmac "your_webhook_secret" -binary | xxd -p)" \
-d '{"message":{"type":"transcript","transcript":"Card is 4532-1234-5678-9010"}}'
# Expected: "Card is [CREDITCARD_REDACTED]"
Ngrok free tier URLs expire after 2 hours. Use ngrok http 3000 --subdomain=yourapp for a persistent subdomain during extended testing.
Footguns
Logging raw transcripts before redaction: Most devs log message.transcript directly for debugging. When a user says "My SSN is 123-45-6789", that hits CloudWatch unredacted. HIPAA violation in 0.3 seconds. Fix: Always log the redacted version. Never log raw payloads.
Skipping webhook signature validation: 60% of security breaches happen because signature checks are commented out during development and never re-enabled. Attackers send fake webhooks to /webhook/vapi with malicious transcripts. Fix: Validate EVERY webhook. Return 401 on signature mismatch. No exceptions.
Using express.json() before signature validation: The middleware parses the body and modifies it. The HMAC signature fails because you're hashing the parsed object, not the raw bytes. Fix: Use express.raw({ type: 'application/json' }) for webhook routes, manually parse JSON after validation.
Batching transcript redaction: Processing multiple transcripts in a batch creates a window where raw PII sits in memory. If the server crashes mid-batch, unredacted data hits logs. Fix: Redact synchronously in the webhook response path. Adds 2-8ms latency but prevents leaks.
Hard-coding encryption keys: Storing AES keys in config.js or .env files committed to Git fails every compliance audit. Fix: Use AWS KMS, HashiCorp Vault, or Azure Key Vault. Rotate keys every 90 days and store keyVersion with encrypted data.
Regex false negatives: "My social is 123 45 6789" (spaces) bypasses /\b\d{3}-\d{2}-\d{4}\b/. Spoken-digit SSNs leak 18% of the time. Fix: Normalize spacing before pattern matching: transcript.replace(/\s+/g, ''). For names and addresses, add an NER model (spaCy, AWS Comprehend Medical) with 98% accuracy.
Ignoring partial transcripts: VAPI sends isPartial: true during barge-ins. These fragments often contain the most sensitive data—users interrupt to correct SSNs. If you only redact final transcripts, partials leak to logs. Fix: Redact every transcript event, partial or final.
Complete working example
This server handles secure voice agent calls with PII redaction, webhook validation, and encrypted storage. Paste this into server.js, configure environment variables, and run.
const express = require('express');
const crypto = require('crypto');
const app = express();
app.use(express.json());
// Environment configuration
const VAPI_WEBHOOK_SECRET = process.env.VAPI_WEBHOOK_SECRET;
const ENCRYPTION_KEY = Buffer.from(process.env.ENCRYPTION_KEY, 'hex'); // 32 bytes
const PORT = process.env.PORT || 3000;
// PII detection patterns
const PII_PATTERNS = {
ssn: /\b\d{3}-?\d{2}-?\d{4}\b/g,
creditCard: /\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b/g,
email: /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/gi,
phone: /\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/g
};
// Webhook signature validation
function validateWebhook(payload, signature) {
const hash = crypto
.createHmac('sha256', VAPI_WEBHOOK_SECRET)
.update(JSON.stringify(payload))
.digest('hex');
return crypto.timingSafeEqual(
Buffer.from(signature),
Buffer.from(hash)
);
}
// PII redaction with entity tracking
function redactPII(transcript) {
let redacted = transcript;
const entities = [];
Object.entries(PII_PATTERNS).forEach(([type, pattern]) => {
redacted = redacted.replace(pattern, (match) => {
entities.push({ type, value: match, timestamp: Date.now() });
return `[${type.toUpperCase()}_REDACTED]`;
});
});
return { redacted, entities };
}
// AES-256-GCM encryption
function encryptData(data) {
const iv = crypto.randomBytes(16);
const cipher = crypto.createCipheriv('aes-256-gcm', ENCRYPTION_KEY, iv);
let encrypted = cipher.update(data, 'utf8', 'hex');
encrypted += cipher.final('hex');
const authTag = cipher.getAuthTag();
return {
encrypted,
iv: iv.toString('hex'),
authTag: authTag.toString('hex'),
keyVersion: 'v2'
};
}
// Webhook handler
app.post('/webhook/vapi', (req, res) => {
const signature = req.headers['x-vapi-signature'];
if (!validateWebhook(req.body, signature)) {
console.error('Invalid webhook signature - possible attack');
return res.status(401).json({ error: 'Unauthorized' });
}
const { message } = req.body;
// Handle transcript events
if (message.type === 'transcript') {
const { redacted, entities } = redactPII(message.transcript);
// Encrypt and store redacted transcript
const encrypted = encryptData(redacted);
console.log('Redacted transcript:', redacted);
console.log('Detected PII entities:', entities.length);
// Store encrypted data in your database here
// Example: await db.transcripts.insert({ callId: message.callId, ...encrypted });
// Handle barge-in interrupts
if (message.role === 'user' && message.isPartial) {
return res.json({
action: 'interrupt',
clearBuffer: true,
transcript: redacted
});
}
return res.json({
action: 'continue',
transcript: redacted
});
}
// Handle end-of-call report
if (message.type === 'end-of-call-report') {
console.log('Call ended:', message.callId);
console.log('Duration:', message.duration, 'seconds');
console.log('Cost:', message.cost);
}
res.json({ status: 'received' });
});
// Health check
app.get('/health', (req, res) => {
res.json({
status: 'healthy',
timestamp: Date.now(),
encryption: 'AES-256-GCM',
piiRedaction: 'enabled'
});
});
// Start server
app.listen(PORT, () => {
console.log(`Secure voice agent server running on port ${PORT}`);
console.log('Webhook endpoint: /webhook/vapi');
console.log('PII redaction: ENABLED');
console.log('Encryption: AES-256-GCM');
});
// Graceful shutdown
process.on('SIGTERM', () => {
console.log('SIGTERM received, shutting down gracefully');
process.exit(0);
});
Run it:
- Install dependencies:
npm install express - Generate encryption key:
node -e "console.log(require('crypto').randomBytes(32).toString('hex'))" - Set environment variables:
bash
export VAPI_WEBHOOK_SECRET="your_webhook_secret" export ENCRYPTION_KEY="generated_32_byte_hex_key" export PORT=3000 - Start server:
node server.js - Expose with ngrok:
ngrok http 3000 - Configure VAPI assistant
serverUrlto your ngrok URL +/webhook/vapi
The server validates all webhooks using HMAC-SHA256, redacts PII in real-time with regex patterns, encrypts sensitive data with AES-256-GCM before storage, and handles barge-in interrupts by clearing TTS buffers. Production-ready for HIPAA-compliant voice agents handling credit cards, SSNs, and health data.
FAQ
Does VAPI natively redact PII in voice transcripts?
No. VAPI provides transport-layer encryption (TLS 1.3) but doesn't filter sensitive data. You must implement server-side redaction by intercepting the transcript webhook event, scanning for SSN/credit card patterns with regex or NER models, and replacing matches with [REDACTED] tokens before the text reaches your LLM. This prevents PII from ever hitting logs or third-party services.
Can I achieve HIPAA compliance using VAPI alone? No. VAPI doesn't sign a BAA. For HIPAA compliance, you need end-to-end encryption (AES-256-GCM on audio buffers), audit logs with tamper-proof hashing (SHA-256), role-based access controls, and data retention policies that auto-purge recordings after 30-90 days. Route VAPI webhooks through a HIPAA-certified proxy (AWS PrivateLink, Azure Private Endpoint) and encrypt all transcript payloads before storage in a BAA-covered database like RDS.
How much latency does PII redaction add to voice agents? Regex-based redaction adds 10-30ms per transcript event. NER models (spaCy, AWS Comprehend Medical) add 50-150ms but catch contextual PII like names and addresses that regex misses. For sub-200ms total latency, run redaction synchronously in the webhook response path—async scanning creates a race condition where partial transcripts leak to logs before redaction completes. Pre-compile regex patterns at server startup to minimize overhead.
What's the difference between PII masking and encryption for voice AI?
Masking replaces PII with placeholders ([REDACTED]) in plaintext—useful for logs but doesn't protect data in transit or at rest. Encryption transforms data into ciphertext using AES-256-GCM, making it unreadable without the decryption key. For GDPR compliance, use encryption for storage and transmission, masking only for display. Never log raw PII—hash identifiers with SHA-256 before writing to disk.
How does VAPI's security compare to Twilio Voice for handling sensitive data?
Twilio offers native PII redaction via RecordingStatusCallback with RedactPii=true, but it's post-call only. VAPI requires custom middleware for real-time filtering. Twilio signs a BAA for HIPAA; VAPI doesn't. For end-to-end encryption, both require you to build the crypto layer—neither encrypts audio buffers by default. Use Twilio if you need turnkey compliance; use VAPI if you need granular control over the redaction pipeline and lower per-minute costs.
Why do webhook signature validations fail after deploying to production?
Most failures happen because the raw request body gets modified before validation. If you use express.json() middleware before checking the signature, it parses the body and the HMAC hash won't match. Use express.raw({ type: 'application/json' }) for webhook routes, manually parse JSON after validation, and ensure your VAPI_WEBHOOK_SECRET environment variable matches the value in the VAPI dashboard exactly—trailing whitespace or newlines break validation.
Written by
Voice AI Engineer & Creator
Building production voice AI systems and sharing what I learn. Focused on VAPI, LLM integrations, and real-time communication. Documenting the challenges most tutorials skip.
Tutorials in your inbox
Weekly voice AI tutorials and production tips. No spam.
Found this helpful?
Share it with other developers building voice AI.



