Handytool
AudioFreeRuns locally

Transcribe audio to text

Convert spoken audio in any language into text in your browser.

.mp3.wav.ogg.m4a.aac.flac.webm.opus
Language
Loading model…

Runs entirely in your browser.

or

About the Transcribe audio to text

Drop an audio file or record straight from your microphone, and get a written transcript in the same language the speaker used — no upload, no account, no app to install. Handytool runs OpenAI's open-source Whisper model directly in your browser using WebGPU when available, so your podcasts, interviews, voice notes, lectures, and meeting recordings stay fully private. Download the result as plain text, an SRT subtitle file, or a WebVTT file ready for video players.

Transcribe audio to text features

  • 01

    99 languages, auto-detected

    Whisper detects the spoken language and writes the transcript in that same language — Spanish stays Spanish, Japanese stays Japanese, German stays German. No language picker, no extra settings.

  • 02

    Upload a file or record live

    Bring an MP3, WAV, M4A, OGG, FLAC, or WebM file, or click Record voice to capture audio directly from your microphone. Stop when you're done and the recording flows straight into transcription.

  • 03

    Private, in-browser processing

    The Whisper model is downloaded once into your browser cache and runs entirely on your device with WebGPU acceleration where supported. Nothing is uploaded — your audio never leaves your computer.

Transcribe audio to text FAQ

How do I transcribe an audio file?
Drop your audio file (MP3, WAV, M4A, OGG, FLAC, or WebM) into the tool and click Transcribe. The first run downloads the speech model (~150 MB); after that, transcription runs locally without an internet connection.
Can I record audio with my microphone?
Yes. Click Record voice, allow microphone access in your browser, speak, and click Stop recording. The captured audio is treated like an uploaded file and you can transcribe it immediately.
Which languages are supported?
All 99 languages Whisper supports — including English, Spanish, Mandarin, French, Arabic, Hindi, German, Russian, Portuguese, Japanese, and many more. The transcript stays in whatever language was spoken.
How long can the audio be?
Files up to 200 MB are accepted. Long recordings are processed in 30-second chunks with 5-second overlap, so a one-hour podcast still produces a coherent transcript. Processing time depends on whether your browser supports WebGPU.
Is the audio uploaded to a server?
No. Both the model and your audio stay in your browser. The Whisper model is fetched once from a CDN and cached, then transcription happens entirely on-device using WebGPU or WebAssembly.
Can I get subtitles for a video?
Yes. Download the .srt or .vtt file and drop it into your video editor, or upload it as a caption track on YouTube and other platforms. Each subtitle line includes the timestamp range Whisper detected.

Related tools

Audio

Explore other tools

All tools