> ## Documentation Index
> Fetch the complete documentation index at: https://www.bolna.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Configure Voice and Transcription Settings

> Set up speech-to-text and text-to-speech for Bolna Voice AI agents. Choose providers like Deepgram and ElevenLabs, and fine-tune voice quality.

## What is the Audio Tab?

The Audio Tab is where you configure how your agent listens and speaks. Set up language preferences, choose transcription providers for speech-to-text, and select voice synthesizers for natural-sounding responses.

<Frame caption="Audio Tab on Bolna Playground">
  <img src="https://mintcdn.com/bolna-54a2d4fe/uH9lQxF0tYMrhiL9/images/getting-started/agent-setup/audio-tab.png?fit=max&auto=format&n=uH9lQxF0tYMrhiL9&q=85&s=b88f505ea43d39a35f781d8a19681aa9" alt="Audio Tab showing language, speech-to-text, and text-to-speech configuration" width="1024" height="932" data-path="images/getting-started/agent-setup/audio-tab.png" />
</Frame>

***

## Configuration Options

### Configure Language

Set your agent's primary language and enable multilingual support.

<Frame caption="Language Configuration">
  <img src="https://mintcdn.com/bolna-54a2d4fe/uH9lQxF0tYMrhiL9/images/getting-started/agent-setup/audio-language.png?fit=max&auto=format&n=uH9lQxF0tYMrhiL9&q=85&s=ef64dd503a90c835e1c3d822221e2bde" alt="Configure Language section with language dropdown and Auto language switch toggle" width="1024" height="135" data-path="images/getting-started/agent-setup/audio-language.png" />
</Frame>

<CardGroup cols={2}>
  <Card title="Language Selection" icon="globe">
    Choose primary language (English, Hindi, Spanish, etc.)
  </Card>

  <Card title="Auto Language Switch" icon="arrows-rotate">
    Automatically detect and switch languages during calls
  </Card>
</CardGroup>

<Info>
  Enable **Auto Language Switch** for multilingual support. Your agent will detect the caller's language and respond accordingly.
</Info>

***

### Speech-to-Text (Transcription)

Configure how your agent converts spoken words into text.

<Frame caption="Speech-to-Text Settings">
  <img src="https://mintcdn.com/bolna-54a2d4fe/uH9lQxF0tYMrhiL9/images/getting-started/agent-setup/audio-stt.png?fit=max&auto=format&n=uH9lQxF0tYMrhiL9&q=85&s=c5c33fa5be01a9e13311324fea063ee7" alt="Speech-to-Text section showing Provider dropdown, Model selection, and Keywords field" width="1024" height="313" data-path="images/getting-started/agent-setup/audio-stt.png" />
</Frame>

<Steps>
  <Step title="Select Provider">
    Choose your transcription provider (e.g., Deepgram, Azure).
  </Step>

  <Step title="Select Model">
    Pick the model (e.g., `nova-3` for best accuracy).
  </Step>

  <Step title="Add Keywords (Optional)">
    Boost recognition of specific terms like names or brand words.
  </Step>
</Steps>

<Tip>
  **Keywords help accuracy!** Add names, brand terms, or technical words with boost values. Format: `word:boost_value` (e.g., `Bruce:100`).
</Tip>

***

### Text-to-Speech (Voice)

Configure how your agent sounds with voice synthesis settings.

<Frame caption="Text-to-Speech Settings">
  <img src="https://mintcdn.com/bolna-54a2d4fe/uH9lQxF0tYMrhiL9/images/getting-started/agent-setup/audio-tts.png?fit=max&auto=format&n=uH9lQxF0tYMrhiL9&q=85&s=01479620b2a9ae54390ddd4fdbd38e17" alt="Text-to-Speech section showing Provider, Model, Voice selection, and voice tuning sliders" width="1024" height="374" data-path="images/getting-started/agent-setup/audio-tts.png" />
</Frame>

<Steps>
  <Step title="Select Provider">
    Choose your voice synthesis provider (e.g., ElevenLabs, Azure).
  </Step>

  <Step title="Select Model">
    Pick the model (e.g., `eleven_turbo_v2_5` for low latency).
  </Step>

  <Step title="Choose Voice">
    Select a specific voice. Click ▶️ to preview!
  </Step>
</Steps>

<Tip>
  Click **"Add voices"** to import or clone custom voices for a unique brand experience.
</Tip>

***

### Voice Tuning

Fine-tune your agent's voice quality with these settings.

| Setting                | Description                     | Recommended                          |
| ---------------------- | ------------------------------- | ------------------------------------ |
| **Buffer Size**        | Audio buffering before playback | 200 (balance of quality and speed)   |
| **Speed Rate**         | Speaking speed                  | 1.0 (natural pace)                   |
| **Similarity Boost**   | Voice matching accuracy         | 0.75 (close to original voice)       |
| **Stability**          | Voice consistency               | 0.5 (balanced expression)            |
| **Style Exaggeration** | Voice characteristics           | 0 (neutral, increase for expressive) |

<Warning>
  **Balance is key!** High buffer size improves quality but increases latency. Test different settings to find the right balance for your use case.
</Warning>

***

## Next Steps

<CardGroup cols={2}>
  <Card title="Agent Tab" icon="file-lines" href="/agent-setup/agent-tab">
    Configure prompts and welcome message
  </Card>

  <Card title="Engine Tab" icon="gear" href="/agent-setup/engine-tab">
    Configure transcription and latency
  </Card>

  <Card title="Clone Voices" icon="waveform" href="/clone-voices">
    Create custom voice clones
  </Card>

  <Card title="Deepgram" icon="microphone" href="/providers/transcriber/deepgram">
    Learn about transcription options
  </Card>
</CardGroup>


Built with [Mintlify](https://mintlify.com).