Introduction
Bolna provides detailed latency metrics for every Voice AI execution, helping you monitor and optimize agent response speed. These metrics break down timing across the entire voice pipeline, from speech recognition to LLM processing to audio synthesis. Access latency data via the Get Execution API in thelatency_data object.
Latency Data Overview
Top-Level Metrics
| Field | Type | Description |
|---|---|---|
stream_id | float | Time (ms) to establish the audio stream connection |
time_to_first_audio | float | Total time (ms) from call start to first audio played to the user |
region | string | Geographic region code (e.g., in for India, us for United States) |
Pipeline Component Metrics
- Transcriber
- LLM
- Synthesizer
Converts spoken audio into text. Tracks how quickly speech is being transcribed.
| Field | Type | Description |
|---|---|---|
time_to_connect | integer | Time (ms) to establish connection with the transcriber |
turn | integer | Sequential conversation turn number (starting at 1) |
sequence_id | integer | Incremental transcription update ID within a turn |
audio_to_text_latency | float | Time (ms) from audio input to transcribed text |
text | string | Transcribed text for this sequence |
Multiple sequences per turn represent incremental refinements. The transcriber provides partial results that improve over time. The final sequence is the most accurate.
Identifying Bottlenecks
Use these thresholds to pinpoint performance issues across the pipeline:High Transcriber Latency (>100ms per sequence)
High Transcriber Latency (>100ms per sequence)
Possible causes:
- Network issues with the transcription service
- Need for a different transcription provider
- Poor audio quality or background noise
High LLM Time to First Token (>1000ms)
High LLM Time to First Token (>1000ms)
Possible causes:
- LLM model is too large or complex
- Prompts need optimization (too long or unstructured)
- High load on the LLM service
High Synthesizer Latency (>500ms to first token)
High Synthesizer Latency (>500ms to first token)
Possible causes:
- Network issues with the TTS service
- Voice model is computationally expensive
- Provider experiencing high load
Related Pages
Get Execution API
Retrieve execution details with latency data
Call Status List
Track the full lifecycle of your calls
Hangup Status Codes
Understand call termination reasons

