How do I transcribe an audio file?

Drop your audio file (MP3, WAV, M4A, OGG, FLAC, or WebM) into the tool and click Transcribe. The first run downloads the speech model (~150 MB); after that, transcription runs locally without an internet connection.

Can I record audio with my microphone?

Yes. Click Record voice, allow microphone access in your browser, speak, and click Stop recording. The captured audio is treated like an uploaded file and you can transcribe it immediately.

Which languages are supported?

All 99 languages Whisper supports — including English, Spanish, Mandarin, French, Arabic, Hindi, German, Russian, Portuguese, Japanese, and many more. The transcript stays in whatever language was spoken.

How long can the audio be?

Files up to 200 MB are accepted. Long recordings are processed in 30-second chunks with 5-second overlap, so a one-hour podcast still produces a coherent transcript. Processing time depends on whether your browser supports WebGPU.

Is the audio uploaded to a server?

No. Both the model and your audio stay in your browser. The Whisper model is fetched once from a CDN and cached, then transcription happens entirely on-device using WebGPU or WebAssembly.

Can I get subtitles for a video?

Yes. Download the .srt or .vtt file and drop it into your video editor, or upload it as a caption track on YouTube and other platforms. Each subtitle line includes the timestamp range Whisper detected.

AudioFreeRuns locally

Transcribe audio to text

Convert spoken audio in any language into text in your browser.

.mp3.wav.ogg.m4a.aac.flac.webm.opus

Language

Loading model…

Runs entirely in your browser.

Drop an audio file here

MP3 · WAV · OGG · M4A · FLAC · WebM · max 200 MB

First run downloads ~150 MB; cached afterwards.

Choose file

About the Transcribe audio to text

Drop an audio file or record straight from your microphone, and get a written transcript in the same language the speaker used — no upload, no account, no app to install. Handytool runs OpenAI's open-source Whisper model directly in your browser using WebGPU when available, so your podcasts, interviews, voice notes, lectures, and meeting recordings stay fully private. Download the result as plain text, an SRT subtitle file, or a WebVTT file ready for video players.

Transcribe audio to text features

01
99 languages, auto-detected
Whisper detects the spoken language and writes the transcript in that same language — Spanish stays Spanish, Japanese stays Japanese, German stays German. No language picker, no extra settings.
02
Upload a file or record live
Bring an MP3, WAV, M4A, OGG, FLAC, or WebM file, or click Record voice to capture audio directly from your microphone. Stop when you're done and the recording flows straight into transcription.
03
Private, in-browser processing
The Whisper model is downloaded once into your browser cache and runs entirely on your device with WebGPU acceleration where supported. Nothing is uploaded — your audio never leaves your computer.

Transcribe audio to text FAQ

How do I transcribe an audio file?: Drop your audio file (MP3, WAV, M4A, OGG, FLAC, or WebM) into the tool and click Transcribe. The first run downloads the speech model (~150 MB); after that, transcription runs locally without an internet connection.
Can I record audio with my microphone?: Yes. Click Record voice, allow microphone access in your browser, speak, and click Stop recording. The captured audio is treated like an uploaded file and you can transcribe it immediately.
Which languages are supported?: All 99 languages Whisper supports — including English, Spanish, Mandarin, French, Arabic, Hindi, German, Russian, Portuguese, Japanese, and many more. The transcript stays in whatever language was spoken.
How long can the audio be?: Files up to 200 MB are accepted. Long recordings are processed in 30-second chunks with 5-second overlap, so a one-hour podcast still produces a coherent transcript. Processing time depends on whether your browser supports WebGPU.
Is the audio uploaded to a server?: No. Both the model and your audio stay in your browser. The Whisper model is fetched once from a CDN and cached, then transcription happens entirely on-device using WebGPU or WebAssembly.
Can I get subtitles for a video?: Yes. Download the .srt or .vtt file and drop it into your video editor, or upload it as a caption track on YouTube and other platforms. Each subtitle line includes the timestamp range Whisper detected.

Guides

Articles →

5 min
Audio guide
How to Transcribe Audio to Text Online
Turn voice memos, interviews, and recordings into searchable text in your browser, with on-device speech recognition.
Updated May 1, 2026Read

Related tools

Audio →

Explore other tools

All tools →

Transcribe audio to text

About the Transcribe audio to text

Transcribe audio to text features

99 languages, auto-detected

Upload a file or record live

Private, in-browser processing

Transcribe audio to text FAQ

Guides

How to Transcribe Audio to Text Online

Related tools

Voice enhancer

Voice isolator

Trim audio

Explore other tools

PDF to PNG

Japan Visa Photo Maker

Subtitle Burner

JSON viewer

Grammar checker