Agent Shortlist

Voice AI Agent

Vapi

Voice agent infrastructure for developers

4.0 / 5DeveloperPay-per-minute: ~$0

Our verdict

Developer-first voice infrastructure with strong customisation hooks. Best for teams wanting more pipeline control than Retell, without building from scratch.

Best for

Engineering teams building production voice products who need fine control over the model, voice synthesis provider, and call routing. Strong API and webhook story.

Not for

Non-technical teams — Retell's SDK is more accessible. Teams that don't need the customisation depth Vapi offers.

Overview

Vapi is positioned as the developer's voice infrastructure. Where Retell hides the voice stack, Vapi exposes it: pick your LLM (Claude, GPT, Gemini), pick your voice synthesis provider (ElevenLabs, PlayHT, Cartesia), pick your transcription service. The flexibility comes with a steeper learning curve — you're configuring more pieces — but for teams shipping voice products at scale or with specific quality requirements, the control is meaningful. Slightly cheaper than Retell on per-minute pricing, with stronger webhook and API surface for integrations. Used by voice-product companies that want to white-label and customise.

What works

  • +Multi-vendor model and voice provider support
  • +Cheaper per-minute pricing than Retell at scale
  • +Strong webhook and API customisation
  • +Good for white-labelled voice products
  • +Active developer community and docs

What doesn't

  • Steeper learning curve than Retell — more configuration to do
  • Quality depends on which voice provider you select
  • Less polished onboarding for non-developers
  • Documentation occasionally lags new features

What operators use it for

01

Custom Voice Products at Scale

Building a voice product as part of your SaaS (e.g. an AI receptionist feature). Vapi's customisation lets you pick voice quality and pricing trade-offs that match your product's tier.

02

White-Labelled Voice for Agencies

Agencies offering voice agent services to clients. Vapi's flexibility lets you offer different voice quality tiers without locking into one provider.

03

Multi-Language Support

Switching voice synthesis providers per language to get the best quality for each market. Harder to do with Retell's bundled stack.

04

Cost-Optimised High-Volume Workflows

When voice quality is acceptable from cheaper providers (Cartesia, PlayHT) and you want to keep per-minute costs down. Vapi's flexibility lets you optimise.

05

Voice Agents with Custom Tools

Heavy function-calling workflows — agents that interact with multiple internal systems during a call. Vapi's webhook architecture handles this cleanly.

Pricing

Pay-per-minute: ~$0.05–0.08 per minute, slightly cheaper than Retell at scale. Free tier for evaluation. Volume discounts.

Common questions about Vapi

What is Vapi?

Vapi is a voice AI platform for developers — flexible API-first infrastructure for building voice agents with custom logic, multi-turn flows, and deep integrations. Targets teams that need more control than no-code voice platforms offer.

How much does Vapi cost?

Vapi prices per minute of voice conversation, with rates starting around $0.05/minute and varying by model, voice provider, and transcription quality. Most builders spend $100–$2,000/month depending on call volume. No flat subscription — pure usage-based.

Vapi vs Retell?

Vapi is more flexible and developer-oriented — better for complex workflows with custom logic, branching, and external API calls. Retell is faster to ship and slightly better on latency for standard flows. For straightforward voice agents, Retell. For ambitious custom flows, Vapi.

What languages does Vapi support?

Vapi supports 30+ languages via the underlying speech-to-text and text-to-speech providers (Deepgram, ElevenLabs, OpenAI, etc.). English and Spanish are best-supported; quality varies by language and voice provider chosen.

Open dataset. This review is part of a structured dataset of every platform on the shortlist, published as platforms.json on GitHub under CC-BY-4.0.