Skip to main content

Documentation Index

Fetch the complete documentation index at: https://www.bolna.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

What is Deepgram Flux?

Deepgram Flux is Deepgram’s latest generation of speech-to-text models, purpose-built for real-time conversational AI. Unlike Nova models that rely on external Voice Activity Detection (VAD) for turn boundaries, Flux models have turn detection built directly into the model — producing a richer event stream that lets Bolna start responding sooner and handle barge-ins more accurately.

Why choose Deepgram Flux for voice AI transcription?

  • Speculative LLM responses: Bolna can start generating an LLM response on EagerEndOfTurn before the speaker has fully stopped, cutting perceived response time significantly.
  • Accurate barge-in detection: The StartOfTurn event fires as soon as speech begins, allowing Bolna to interrupt playback with zero VAD delay.
  • Language Identification (Flux Multi): flux-general-multi identifies the spoken language per turn and returns it alongside the transcript, enabling dynamic multilingual handling without pre-configuring a language.
  • Configurable turn sensitivity: End-of-turn thresholds and timeouts are exposed as tunable parameters, so you can balance responsiveness against accuracy for your specific use case.

Which Deepgram Flux models are supported on Bolna AI?

ModelDescription
Flux (English)English-only Flux model optimised for accuracy and latency
Flux (Multilingual)Multilingual Flux model with built-in Language Identification

Configurable parameters

EndOfTurn Threshold (eot_threshold)

Controls how confident the model must be that the speaker has finished their turn before emitting a final transcript.
ValueBehaviour
0.5Responds sooner, higher chance of cutting off the speaker
0.7Default — balanced for most voice agent use cases
0.9Waits longer, reduces false endings on incomplete sentences
Range: 0.50.9 (step 0.05)

EndOfTurn Timeout (eot_timeout_ms)

Maximum silence duration (in milliseconds) the model waits after the last detected speech before forcing an EndOfTurn event. Acts as a safety net when the model’s confidence score alone is insufficient.
ValueBehaviour
300 ms900 msAggressive — good for fast back-and-forth interactions
1 sDefault — works well for most voice agent use cases
2 s3 sPatient — useful for agents that ask open-ended questions
Options: 300 ms, 400 ms, 500 ms, 600 ms, 700 ms, 800 ms, 900 ms, 1 s, 2 s, 3 s

Eager EndOfTurn (eager_eot_threshold)

When enabled, Flux emits an EagerEndOfTurn event before the final EndOfTurn. Bolna uses this to start LLM inference speculatively — if the speaker continues (TurnResumed), the speculative request is cancelled; if the speaker stops (EndOfTurn), the response is already in flight. Enable this toggle to activate eager turn detection. When enabled, set the Eager Threshold:
ValueBehaviour
0.3Triggers very early — maximum latency reduction, higher cancellation rate
0.5Default — good balance between speed and accuracy
0.9Triggers late — nearly as conservative as standard EndOfTurn
Range: 0.30.9 (step 0.05)
Enable Eager EndOfTurn with a threshold of 0.40.5 for the lowest perceived response latency. If you see frequent mid-sentence interruptions, raise the threshold or disable it.

Next steps

Ready to configure Deepgram Flux for your voice AI agent? Open the Audio tab in the Bolna Playground, select Flux (English) or Flux (Multilingual) as your transcriber model, and tune the parameters above. For related integrations: