This page focuses on where Deepgram fits in a production voice stack. For full setup steps, credentials, and API details, use the documentation link above.
Overview
Deepgram is the leading speech-to-text (STT) provider purpose-built for real-time voice applications. With industry-leading accuracy and ultra-low latency, Deepgram powers the listening capability of your Bolna voice agents, converting spoken words into text that your LLM can understand and respond to.
Deepgram's AI models are trained on diverse datasets including phone conversations, making it the ideal choice for voice agent applications where every word counts.
Features & Use Cases
Highly Accurate (Nova-2 Model)
Deepgram gets it right over 95% of the time, easily understanding different accents, background noise, and varying phone call qualities.
Lightning-Fast Speed
It turns speech into text in under 300 milliseconds. This real-time speed prevents awkward pauses and keeps conversations smooth.
Custom Words & Jargon
You can teach the AI your specific industry terms, product names, or technical jargon so it always recognizes them correctly.
Speaks Many Languages
Understands over 30 languages - including English, Spanish, French, German, and Hindi - and accurately handles native accents.
Clean, Readable Text
Automatically adds punctuation, capitalizes the right words, and formats numbers so the final text is clean and ready to use.
Optimized for Phone Calls
Specially trained to handle typical phone call audio perfectly, making it ideal for call centers and customer support.
Knows Who is Speaking
Can automatically figure out and separate who is talking at any given moment, which is perfect for distinguishing the agent from the customer.