Skip to main content
Every Bolna agent uses three provider categories: a transcriber (speech-to-text), an LLM (language model), and a synthesizer (text-to-speech). This page helps you choose the right combination for your use case.
Provider model lineups change frequently. The specific model names here are current examples — always check the relevant LLM provider page or the provider’s own docs for the latest recommended model.

Quick Decision Guide

Primary concernTranscriberLLMSynthesizer
English, lowest latencyDeepgram Nova-3gpt-5.4-miniElevenLabs Turbo v2.5
Indian languages (Hindi, Tamil, etc.)Sarvamgpt-5.4-mini or SarvamSarvam
European multilingualAzure Speechgpt-5.4Azure TTS
Enterprise / data residencyAzure SpeechAzure OpenAIAzure TTS
Cost-sensitive high volumeDeepgram Nova-2deepseek-v4-flashAWS Polly
Complex reasoning or sensitive domainDeepgram Nova-3gpt-5.4 or claude-sonnet-5ElevenLabs

Transcribers (speech-to-text)

Language support is the primary selection factor — most transcribers are optimized for specific language families.
ProviderBest forNotes
Deepgram Nova-3EnglishFastest; best accuracy for English
Deepgram Nova-2Many languagesBroader language coverage
ElevenLabsEnglishHigh accuracy; slightly higher latency
Azure SpeechEnterprise, many languagesStrong multilingual; good for EU
AssemblyAIEnglishHigh accuracy; async features
SarvamIndian languagesBest for Hindi, Tamil, Bengali, Telugu, Marathi, etc.
GladiaEuropean languagesGood multilingual coverage
SonioxEnglishLow latency option
Key settings: endpointing (silence detection), language (always set explicitly — auto-detection adds latency), encoding / sampling_rate (must match your telephony provider).

LLMs (language models)

Quality tier and latency are the primary selection factors. See each provider’s page for current model names.
TierWhen to useExamples
Fast / cost-efficientMost agents — lead qualification, reminders, scheduling, FAQsgpt-5.4-mini, gemini-2.5-flash, deepseek-v4-flash
High qualityComplex reasoning, financial/medical, sensitive conversationsgpt-5.4, gpt-5.5, claude-sonnet-5
EnterpriseData residency requirementsAzure OpenAI
Custom / self-hostedOn-premise or proprietary modelsCustom LLM
LLM provider pages:

OpenAI

GPT-5.4-mini, GPT-5.4, GPT-5.5

Anthropic

Claude Sonnet 5, Haiku 4.5, Opus 4.8

Google Gemini

Gemini 2.5 Flash, Gemini 3.x

Azure OpenAI

GPT-5.x via Azure infrastructure

DeepSeek

DeepSeek V4 Flash, V4 Pro

OpenRouter

Unified gateway — all providers

Synthesizers (text-to-speech)

Latency and language are the primary selection factors. Always enable stream: true.
ProviderBest forNotes
ElevenLabsNatural English voicesBolna default; Turbo models are fastest
CartesiaLowest latency EnglishVery fast time-to-first-audio
Azure TTSMultilingual, enterpriseStrong across many languages
AWS PollyCost-sensitive workloadsLower cost; neural voices available
Deepgram AuraEnglishFast and accurate
RimeNatural conversational EnglishGood prosody for dialogue
SarvamIndian languagesNative voices for Hindi, Tamil, Telugu, and more
Smallest AIUltra-low latencyOptimized for real-time
Key settings: stream: true (always enable), buffer_size (100–250 chars typical), audio_format (must match your telephony provider).

Telephony providers

ProviderBest forNotes
PlivoIndia + globalDefault for most Bolna deployments
ExotelIndiaStrong local support; DLT compliance built in
TwilioUS / globalWidest geographic reach
VobizIndiaCompetitive rates
Custom SIP (BYOT)On-premise / bring your own trunkSee SIP Trunking