Skip to main content

Documentation Index

Fetch the complete documentation index at: https://www.bolna.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

The Audio Tab controls how your agent listens and speaks. Configure languages, choose transcription and voice providers, and tune audio quality. For multilingual agents, you can select different STT and TTS providers per language.
Full Audio Tab view with Languages section showing English Primary, Hindi, and Dutch, Speech-to-Text with Deepgram nova-3 and Keywords, Text-to-Speech with ElevenLabs Eleven Turbo v2.5, Nila voice, and tuning sliders for Buffer Size, Speed rate, Similarity Boost, Stability, and Style Exaggeration

Languages

Set the languages your agent can understand and speak. Pick a primary language and add secondary languages for multilingual conversations.
Bolna Audio Tab language configuration showing English marked as Primary, Dutch and Hindi as secondary languages, and an Add Language button on the right
  • Primary Language is marked with (Primary) and is the language your agent uses at the start of every conversation. The main prompt and multilingual settings are tied to this language.
  • Secondary Languages allow the agent to understand and respond when a caller switches languages mid-call.
  • Click + Add Language to add more languages.
  • Remove any language by clicking the x next to it.

Changing the Primary Language

Click the crown icon next to any secondary language to make it primary. A tooltip will confirm the action, for example “Make Hindi primary”. This sets the selected language as the default for the main prompt and multilingual settings.
Tooltip showing Make Hindi primary option when clicking the crown icon next to Hindi in the Bolna language selector

Supported Languages

LanguageCode
Englishen
Hindihi
Bengalibn
Assameseas
Frenchfr
Gujaratigu
Indonesianid
Kannadakn
Malayms
Malayalamml
Marathimr
Odiaod
Punjabipa
Spanishes
Tamilta
Telugute
Urduur
Dutchnl
For agents that handle multiple languages in a single call, see the Multilingual Support guide.

Speech-to-Text

Controls how your agent converts the caller’s spoken words into text before the LLM processes them. For multilingual agents, each language can have its own STT provider and model. Select a language tab to configure its transcription settings independently.
Different languages may perform better with different providers. For example, use Sarvam for Hindi and Deepgram for English.
Bolna Speech-to-Text settings showing Provider dropdown set to Azure, Model dropdown set to Azure, and Keywords input field with Bruce:100 as an example keyword boost entry

Provider and Model

Choose a transcription provider from the Provider dropdown, then pick the specific model from the Model dropdown.
ProviderWhat it offers
AssemblyAIReal-time transcription with strong punctuation and formatting
AzureMicrosoft Azure Speech Services
DeepgramHigh-accuracy, low-latency transcription with keyword boosting
ElevenLabsTranscription powered by ElevenLabs
GladiaMultilingual transcription service
GoogleGoogle Cloud Speech-to-Text
OpenAIOpenAI Whisper-based transcription
SarvamOptimized for Indian languages like Hindi, Tamil, and Telugu
SmallestLightweight, fast transcription provider

Keywords

Boost recognition accuracy for specific words the transcriber might miss, such as brand names, product names, or technical terms. Enter keywords in the format word:boost_value (e.g., Bruce:100).
Keyword boosting is only available with Deepgram. The Keywords field has no effect when using other providers.

Text-to-Speech

Controls how your agent sounds when speaking to the caller. For multilingual agents, each language can have its own TTS provider, model, and voice. Select a language tab to configure voice settings independently.
Bolna Text-to-Speech settings showing Sarvam provider, Bulbul v2 model, Anjura voice selected, with sliders for Buffer Size at 220, Speed rate at 1, Similarity Boost at 0.65, Stability at 0.7, and Style Exaggeration at 0

Provider, Model, and Voice

1

Select a Provider

Choose from AzureTTS, Cartesia, ElevenLabs, or Sarvam.
2

Pick a Model

Select the model that fits your latency and quality needs (e.g., ElevenLabs eleven_turbo_v2_5 for low latency).
3

Choose a Voice

Click the Voice dropdown to browse all voices for the selected provider and model.

Browsing Voices

Click the Voice dropdown to see a searchable list of all available voices. Filter by gender using the All, Male, Female, and Neutral tabs. Each voice shows a play button so you can preview it before selecting.
Bolna voice selector dropdown with search bar, gender filter tabs for All Male Female and Neutral, and voices including Chef DJ, Viraj, Ben, Roger, Matt, and Angelica with play buttons

Preview Welcome Message

Click Preview welcome message to hear the selected voice speak your agent’s welcome prompt (configured in the Agent Tab). This lets you test how the voice sounds before going live.

Voice Tuning Parameters

Fine-tune your agent’s voice using the sliders below the voice selector. Available parameters may vary by provider.
ParameterWhat it controls
Buffer SizeAudio buffered before playback begins. Higher values produce smoother audio but increase delay. Values between 150 and 250 work well for real-time conversations.
Speed RateSpeaking speed. 1 is natural pace, above 1 is faster, below 1 is slower.
Similarity BoostHow closely the output matches the original voice sample. Higher values are more faithful but may reduce naturalness.
StabilityVoice consistency across sentences. Higher values keep tone steady, lower values add expressive variation.
Style ExaggerationEmphasis on stylistic characteristics. 0 is neutral, higher values add more personality.
High Buffer Size improves quality but adds latency. If callers notice a delay before the agent speaks, lower this value.

Adding and Cloning Voices

Click the Add Voice + button in the Text-to-Speech section to add a custom voice by ID or clone one from an audio sample.
Custom voice uploads are only available for ElevenLabs and Cartesia.

Add a Voice by ID

Use this when you already have a voice ID from your provider’s voice library.
Bolna Add Voice modal with Add by ID tab active, ElevenLabs selected as provider, Voice ID input field with placeholder pNInz6obpgDQGcFmaJgB, and a blue Add voice button
1

Select the Add by ID tab

Make sure the Add by ID tab is selected in the dialog.
2

Choose a Provider

Select ElevenLabs or Cartesia.
3

Enter the Voice ID

Paste the voice ID from your provider. For ElevenLabs, find IDs in the ElevenLabs voice library.
4

Click Add voice

The voice will appear in the Voice dropdown for all your agents.

Clone a Voice

Create a new voice by uploading an audio recording. Useful for maintaining a consistent brand voice or using a specific person’s voice (with their permission).
Bolna Clone Voice modal with Cartesia selected as provider, Voice name placeholder Sales Assistant Voice, Description placeholder Warm male Indian accent, Sample language Hindi, and drag-and-drop upload area for audio files up to 10 MB
1

Select the Clone Voice tab

Switch to the Clone Voice tab in the dialog.
2

Choose a Provider

Select ElevenLabs or Cartesia.
3

Enter Voice Details

Add a Voice name (e.g., “Sales Assistant Voice”) and Description (e.g., “Warm male Indian accent”).
4

Select Sample Language

Choose the language of your audio sample.
5

Upload an Audio Sample

Drag and drop your audio file or click click to browse. Audio files only, maximum 10 MB.
6

Click Clone voice

The platform processes your sample and adds the new voice to the Voice dropdown.

Supported Languages for Voice Cloning

Both ElevenLabs and Cartesia support the same set of languages for cloning:
LanguageCodeLanguageCode
EnglishenHindihi
BengalibnAssameseas
DutchnlFrenchfr
GujaratiguIndonesianid
KannadaknMalayms
MalayalammlMarathimr
OdiaodPunjabipa
SpanishesTamilta
TeluguteUrduur
Indian Multilingual-
For best results, use a clean recording with no background noise, a single speaker, and at least 30 seconds of continuous speech.

Next Steps

Engine Tab

Configure interruption handling, endpointing, and latency

Multilingual Support

Set up agents that speak multiple languages in a single call

Clone Voices

Create a custom voice from an audio sample

Deepgram Provider

Explore Deepgram transcription models and keyword boosting