OpenAI Realtime Transcriber (Speech to Text)

What is OpenAI Realtime STT?

OpenAI’s Realtime API provides a WebSocket-based streaming speech-to-text service. Unlike batch transcription, audio is streamed continuously and transcripts are returned with low latency as the caller speaks, making it well-suited for live voice agent conversations.

Why choose OpenAI Realtime for voice AI transcription?

Ultra-low latency: Streams audio and returns interim transcript deltas as the caller speaks, with final results delivered on turn boundaries.
Built-in server VAD: GPT Realtime Whisper has voice activity detection built in — it automatically detects speech start and end without any manual configuration.
Streaming interim transcripts: Partial transcripts arrive as the caller is still speaking, giving your agent an early signal to start processing.
Optional noise reduction: Near-field noise reduction can be enabled to improve accuracy in office or call-centre environments.
Delay tuning: The delay parameter lets you trade off transcription latency against accuracy — useful for noisy environments or accented speech.

Which OpenAI models are supported?

Model	Description
`GPT Realtime Whisper`	GA streaming transcription model. Natively designed for real-time sessions with built-in VAD.

Configurable parameters

These parameters appear in the Audio tab of the Bolna Playground when OpenAI is selected as the transcriber provider.

Transcription Delay (`delay`)

Controls the trade-off between transcription speed and accuracy. A lower delay emits results sooner; a higher delay gives the model more audio context before committing to a transcript.

Value	Behaviour
`minimal`	Fastest — best for simple, clean audio
`low`	Low latency with good accuracy
`medium`	Default — balanced for most use cases
`high`	Higher accuracy — useful for accents or background noise
`xhigh`	Maximum accuracy — highest latency

Noise Reduction

Toggle on to enable near-field noise reduction. Recommended for call-centre or office environments where background noise is common.

Supported languages

GPT Realtime Whisper supports a wide range of languages. Pass the ISO 639-1 code as the language parameter when configuring the transcriber. Common supported languages include: en, es, fr, de, hi, pt, ja, it, nl, zh, ko, ar, ru.

Next steps

Ready to configure OpenAI transcription for your voice AI agent? Open the Audio tab in the Bolna Playground, select openai as the transcriber provider and GPT Realtime Whisper as the model, then tune the parameters above. For related integrations:

Compare with Deepgram Flux for an alternative with configurable turn detection
Compare with Deepgram Nova for a widely-deployed production alternative
Learn about multilingual support for global agents

Getting Started

Using Bolna Platform

Pricing

Enterprise

On premise deployments

Multilingual Voice agents

Integrations

Voice AI Agent Function calls

Features

Graph Agents

Advance capabilities

Supported Telephony

Phone calls using Bolna

Resources

OpenAI Realtime Transcriber (Speech to Text)

What is OpenAI Realtime STT?

Why choose OpenAI Realtime for voice AI transcription?

Which OpenAI models are supported?

Configurable parameters

Transcription Delay (`delay`)

Noise Reduction

Supported languages

Next steps

Getting Started

Using Bolna Platform

Pricing

Enterprise

On premise deployments

Multilingual Voice agents

Integrations

Voice AI Agent Function calls

Features

Graph Agents

Advance capabilities

Supported Telephony

Phone calls using Bolna

Resources

Documentation Index

​What is OpenAI Realtime STT?

​Why choose OpenAI Realtime for voice AI transcription?

​Which OpenAI models are supported?

​Configurable parameters

​Transcription Delay (delay)

​Noise Reduction

​Supported languages

​Next steps

What is OpenAI Realtime STT?

Why choose OpenAI Realtime for voice AI transcription?

Which OpenAI models are supported?

Configurable parameters

Transcription Delay (`delay`)

Noise Reduction

Supported languages

Next steps