How do I transcribe a video file?

Drop your video file (MP4, MOV, WebM, MKV, M4V, or AVI) into the tool and click Transcribe. The audio is extracted locally with FFmpeg, then passed to Whisper. The first run downloads the speech model (~150 MB); after that, transcription works offline.

Can I generate subtitles for YouTube?

Yes. After transcribing, click Download .srt or Download .vtt — both formats are accepted by YouTube Studio's caption uploader, as well as Vimeo, LinkedIn, and most video editors.

Which video formats are supported?

MP4, MOV, WebM, MKV, M4V, and AVI containers up to 500 MB. The audio track inside the video is what matters — common codecs like AAC, MP3, Opus, and Vorbis all work.

Which languages can it transcribe?

All 99 languages Whisper supports — including English, Spanish, Mandarin, French, Arabic, Hindi, German, Russian, Portuguese, Japanese, and many more. The transcript stays in whatever language was spoken in the video.

How long can the video be?

Files up to 500 MB are accepted, which usually covers an hour of HD video or several hours of compressed footage. Long recordings are processed in 30-second chunks with 5-second overlap so the transcript stays coherent.

Is the video uploaded to a server?

No. Both the model and your video stay in your browser. FFmpeg.wasm extracts the audio locally, then Whisper transcribes it on-device using WebGPU or WebAssembly. Nothing leaves your computer.

VideoFreeRuns locally

Transcribe video to text

Convert spoken video into text and subtitles directly in your browser.

.mp4.mov.webm.mkv.m4v.avi

Source

Language

Loading model…

Runs entirely in your browser.

Drop a video file here

MP4 · MOV · WebM · MKV · M4V · AVI · max 500 MB

First run downloads ~150 MB; cached afterwards.

Choose file

About the Transcribe video to text

Drop a video file and get a written transcript plus ready-to-use subtitles in the same language the speaker used — no upload, no account, no app to install. Handytool extracts the audio with FFmpeg and runs OpenAI's open-source Whisper model directly in your browser using WebGPU when available, so your interviews, lectures, Zoom recordings, YouTube videos, and meetings stay fully private. Download the result as plain text, an SRT subtitle file, or a WebVTT file ready for video players and YouTube.

Transcribe video to text features

01
99 languages, auto-detected
Whisper detects the spoken language and writes the transcript in that same language — Spanish stays Spanish, Japanese stays Japanese, German stays German. Pick a language manually if your video is in a niche locale or has heavy accents.
02
Subtitles ready for any player
Every transcription comes with timestamped chunks you can export as .srt or .vtt — drop them straight into Premiere, Final Cut, DaVinci Resolve, or upload them as a caption track on YouTube, Vimeo, or LinkedIn.
03
Private, in-browser processing
Audio is extracted with FFmpeg.wasm and transcribed by Whisper, both running on your device with WebGPU acceleration where supported. The video is never uploaded — everything stays in your browser cache.

Transcribe video to text FAQ

How do I transcribe a video file?: Drop your video file (MP4, MOV, WebM, MKV, M4V, or AVI) into the tool and click Transcribe. The audio is extracted locally with FFmpeg, then passed to Whisper. The first run downloads the speech model (~150 MB); after that, transcription works offline.
Can I generate subtitles for YouTube?: Yes. After transcribing, click Download .srt or Download .vtt — both formats are accepted by YouTube Studio's caption uploader, as well as Vimeo, LinkedIn, and most video editors.
Which video formats are supported?: MP4, MOV, WebM, MKV, M4V, and AVI containers up to 500 MB. The audio track inside the video is what matters — common codecs like AAC, MP3, Opus, and Vorbis all work.
Which languages can it transcribe?: All 99 languages Whisper supports — including English, Spanish, Mandarin, French, Arabic, Hindi, German, Russian, Portuguese, Japanese, and many more. The transcript stays in whatever language was spoken in the video.
How long can the video be?: Files up to 500 MB are accepted, which usually covers an hour of HD video or several hours of compressed footage. Long recordings are processed in 30-second chunks with 5-second overlap so the transcript stays coherent.
Is the video uploaded to a server?: No. Both the model and your video stay in your browser. FFmpeg.wasm extracts the audio locally, then Whisper transcribes it on-device using WebGPU or WebAssembly. Nothing leaves your computer.

Guides

Articles →

5 min
Video guide
How to Transcribe a Video to Text Free in Your Browser
Convert spoken video to text and subtitles free in your browser. Supports 99 languages, outputs SRT and VTT. No upload — Whisper AI runs locally.
Updated Feb 11, 2026Read

Related tools

Video →

Explore other tools

All tools →

Transcribe video to text

About the Transcribe video to text

Transcribe video to text features

99 languages, auto-detected

Subtitles ready for any player

Private, in-browser processing

Transcribe video to text FAQ

Guides

How to Transcribe a Video to Text Free in Your Browser

Related tools

Voice enhancer for video

Trim Video

Cut & Edit Video

Explore other tools

Add Page Numbers to PDF

Image Color Picker

Extract audio

JSON to CSV

Grammar checker