Provider model lineups change frequently. The specific model names here are current examples — always check the relevant LLM provider page or the provider’s own docs for the latest recommended model.
Quick Decision Guide
| Primary concern | Transcriber | LLM | Synthesizer |
|---|---|---|---|
| English, lowest latency | Deepgram Nova-3 | gpt-5.4-mini | ElevenLabs Turbo v2.5 |
| Indian languages (Hindi, Tamil, etc.) | Sarvam | gpt-5.4-mini or Sarvam | Sarvam |
| European multilingual | Azure Speech | gpt-5.4 | Azure TTS |
| Enterprise / data residency | Azure Speech | Azure OpenAI | Azure TTS |
| Cost-sensitive high volume | Deepgram Nova-2 | deepseek-v4-flash | AWS Polly |
| Complex reasoning or sensitive domain | Deepgram Nova-3 | gpt-5.4 or claude-sonnet-5 | ElevenLabs |
Transcribers (speech-to-text)
Language support is the primary selection factor — most transcribers are optimized for specific language families.| Provider | Best for | Notes |
|---|---|---|
| Deepgram Nova-3 | English | Fastest; best accuracy for English |
| Deepgram Nova-2 | Many languages | Broader language coverage |
| ElevenLabs | English | High accuracy; slightly higher latency |
| Azure Speech | Enterprise, many languages | Strong multilingual; good for EU |
| AssemblyAI | English | High accuracy; async features |
| Sarvam | Indian languages | Best for Hindi, Tamil, Bengali, Telugu, Marathi, etc. |
| Gladia | European languages | Good multilingual coverage |
| Soniox | English | Low latency option |
endpointing (silence detection), language (always set explicitly — auto-detection adds latency), encoding / sampling_rate (must match your telephony provider).
LLMs (language models)
Quality tier and latency are the primary selection factors. See each provider’s page for current model names.| Tier | When to use | Examples |
|---|---|---|
| Fast / cost-efficient | Most agents — lead qualification, reminders, scheduling, FAQs | gpt-5.4-mini, gemini-2.5-flash, deepseek-v4-flash |
| High quality | Complex reasoning, financial/medical, sensitive conversations | gpt-5.4, gpt-5.5, claude-sonnet-5 |
| Enterprise | Data residency requirements | Azure OpenAI |
| Custom / self-hosted | On-premise or proprietary models | Custom LLM |
OpenAI
GPT-5.4-mini, GPT-5.4, GPT-5.5
Anthropic
Claude Sonnet 5, Haiku 4.5, Opus 4.8
Google Gemini
Gemini 2.5 Flash, Gemini 3.x
Azure OpenAI
GPT-5.x via Azure infrastructure
DeepSeek
DeepSeek V4 Flash, V4 Pro
OpenRouter
Unified gateway — all providers
Synthesizers (text-to-speech)
Latency and language are the primary selection factors. Always enablestream: true.
| Provider | Best for | Notes |
|---|---|---|
| ElevenLabs | Natural English voices | Bolna default; Turbo models are fastest |
| Cartesia | Lowest latency English | Very fast time-to-first-audio |
| Azure TTS | Multilingual, enterprise | Strong across many languages |
| AWS Polly | Cost-sensitive workloads | Lower cost; neural voices available |
| Deepgram Aura | English | Fast and accurate |
| Rime | Natural conversational English | Good prosody for dialogue |
| Sarvam | Indian languages | Native voices for Hindi, Tamil, Telugu, and more |
| Smallest AI | Ultra-low latency | Optimized for real-time |
stream: true (always enable), buffer_size (100–250 chars typical), audio_format (must match your telephony provider).
Telephony providers
| Provider | Best for | Notes |
|---|---|---|
| Plivo | India + global | Default for most Bolna deployments |
| Exotel | India | Strong local support; DLT compliance built in |
| Twilio | US / global | Widest geographic reach |
| Vobiz | India | Competitive rates |
| Custom SIP (BYOT) | On-premise / bring your own trunk | See SIP Trunking |

