Handytool
VideoFreeRuns locally

Transcribe video to text

Convert spoken video into text and subtitles directly in your browser.

.mp4.mov.webm.mkv.m4v.avi
Language
Loading model…

Runs entirely in your browser.

About the Transcribe video to text

Drop a video file and get a written transcript plus ready-to-use subtitles in the same language the speaker used — no upload, no account, no app to install. Handytool extracts the audio with FFmpeg and runs OpenAI's open-source Whisper model directly in your browser using WebGPU when available, so your interviews, lectures, Zoom recordings, YouTube videos, and meetings stay fully private. Download the result as plain text, an SRT subtitle file, or a WebVTT file ready for video players and YouTube.

Transcribe video to text features

  • 01

    99 languages, auto-detected

    Whisper detects the spoken language and writes the transcript in that same language — Spanish stays Spanish, Japanese stays Japanese, German stays German. Pick a language manually if your video is in a niche locale or has heavy accents.

  • 02

    Subtitles ready for any player

    Every transcription comes with timestamped chunks you can export as .srt or .vtt — drop them straight into Premiere, Final Cut, DaVinci Resolve, or upload them as a caption track on YouTube, Vimeo, or LinkedIn.

  • 03

    Private, in-browser processing

    Audio is extracted with FFmpeg.wasm and transcribed by Whisper, both running on your device with WebGPU acceleration where supported. The video is never uploaded — everything stays in your browser cache.

Transcribe video to text FAQ

How do I transcribe a video file?
Drop your video file (MP4, MOV, WebM, MKV, M4V, or AVI) into the tool and click Transcribe. The audio is extracted locally with FFmpeg, then passed to Whisper. The first run downloads the speech model (~150 MB); after that, transcription works offline.
Can I generate subtitles for YouTube?
Yes. After transcribing, click Download .srt or Download .vtt — both formats are accepted by YouTube Studio's caption uploader, as well as Vimeo, LinkedIn, and most video editors.
Which video formats are supported?
MP4, MOV, WebM, MKV, M4V, and AVI containers up to 500 MB. The audio track inside the video is what matters — common codecs like AAC, MP3, Opus, and Vorbis all work.
Which languages can it transcribe?
All 99 languages Whisper supports — including English, Spanish, Mandarin, French, Arabic, Hindi, German, Russian, Portuguese, Japanese, and many more. The transcript stays in whatever language was spoken in the video.
How long can the video be?
Files up to 500 MB are accepted, which usually covers an hour of HD video or several hours of compressed footage. Long recordings are processed in 30-second chunks with 5-second overlap so the transcript stays coherent.
Is the video uploaded to a server?
No. Both the model and your video stay in your browser. FFmpeg.wasm extracts the audio locally, then Whisper transcribes it on-device using WebGPU or WebAssembly. Nothing leaves your computer.

Related tools

Video

Explore other tools

All tools