Description

Changelog

Reviews (0)

Features

Realtime audio transcription	Offline audio transcription
GPU acceleration	Flash Attention
Voice Activity Detection (VAD)	Quantized models
99 languages	Model downloader

Afterwards:

Activate the extension in Project -> Project Settings -> Godot Whisper. Restart the Godot editor.

Models

Models manual download link: Hugging Face.

Model	Size
tiny	78 MB
base	148 MB
small	244M
medium	769M
large-v1	1550M
large-v2	1550M
large-v3	1550M
large-v3-turbo	809M

Global settings

Go to Project -> Project Settings -> General -> Audio -> Input (Check Advance Settings).

You will see a bunch of settings there.

Microphone transcription feeds Whisper at 16000 Hz. The addon resamples captured audio from the actual runtime mix rate reported by AudioServer.get_mix_rate().

Optional: set Project Settings -> Audio -> Driver -> Mix Rate (audio/driver/mix_rate) to 16000 to avoid resampling overhead. This may reduce overall game audio quality, so only use it if speech transcription is the main audio workload. Godot may still use a different runtime mix rate on some platforms or devices; verify with AudioServer.get_mix_rate(). If the runtime mix rate is not 16000, the addon will resample.