Transcribe video to text
Convert spoken video into text and subtitles directly in your browser.
Runs entirely in your browser.
About the Transcribe video to text
Drop a video file and get a written transcript plus ready-to-use subtitles in the same language the speaker used — no upload, no account, no app to install. Handytool extracts the audio with FFmpeg and runs OpenAI's open-source Whisper model directly in your browser using WebGPU when available, so your interviews, lectures, Zoom recordings, YouTube videos, and meetings stay fully private. Download the result as plain text, an SRT subtitle file, or a WebVTT file ready for video players and YouTube.
Transcribe video to text features
- 01
99 languages, auto-detected
Whisper detects the spoken language and writes the transcript in that same language — Spanish stays Spanish, Japanese stays Japanese, German stays German. Pick a language manually if your video is in a niche locale or has heavy accents.
- 02
Subtitles ready for any player
Every transcription comes with timestamped chunks you can export as .srt or .vtt — drop them straight into Premiere, Final Cut, DaVinci Resolve, or upload them as a caption track on YouTube, Vimeo, or LinkedIn.
- 03
Private, in-browser processing
Audio is extracted with FFmpeg.wasm and transcribed by Whisper, both running on your device with WebGPU acceleration where supported. The video is never uploaded — everything stays in your browser cache.
Transcribe video to text FAQ
- How do I transcribe a video file?
- Drop your video file (MP4, MOV, WebM, MKV, M4V, or AVI) into the tool and click Transcribe. The audio is extracted locally with FFmpeg, then passed to Whisper. The first run downloads the speech model (~150 MB); after that, transcription works offline.
- Can I generate subtitles for YouTube?
- Yes. After transcribing, click Download .srt or Download .vtt — both formats are accepted by YouTube Studio's caption uploader, as well as Vimeo, LinkedIn, and most video editors.
- Which video formats are supported?
- MP4, MOV, WebM, MKV, M4V, and AVI containers up to 500 MB. The audio track inside the video is what matters — common codecs like AAC, MP3, Opus, and Vorbis all work.
- Which languages can it transcribe?
- All 99 languages Whisper supports — including English, Spanish, Mandarin, French, Arabic, Hindi, German, Russian, Portuguese, Japanese, and many more. The transcript stays in whatever language was spoken in the video.
- How long can the video be?
- Files up to 500 MB are accepted, which usually covers an hour of HD video or several hours of compressed footage. Long recordings are processed in 30-second chunks with 5-second overlap so the transcript stays coherent.
- Is the video uploaded to a server?
- No. Both the model and your video stay in your browser. FFmpeg.wasm extracts the audio locally, then Whisper transcribes it on-device using WebGPU or WebAssembly. Nothing leaves your computer.
Related tools
Video →- Live
Trim Video
Cut the start or end of a video with frame-level precision.
VideoFreeOpen - Live
Cut & Edit Video
Cut, rearrange, and merge video clips in your browser — no upload needed.
VideoFreeOpen - Live
Convert video
Convert video between MP4, WebM, MOV, and MKV — free, private, in your browser.
VideoFreeRuns locallyOpen
Explore other tools
All tools →- Live
PDF to JPG
Convert each page of a PDF into a sharp JPG, PNG, or WebP image right in your browser — no upload, no quality loss.
PDFFreeRuns locallyOpen - Live
Remove background
Erase the background of a photo using an in-browser AI model — no upload, your images stay on your device.
ImageFreeRuns locallyOpen - Live
Trim audio
Cut a section of an audio file.
AudioFreeRuns locallyOpen - Live
Markdown to HTML
Convert Markdown into clean HTML right in your browser.
DocumentFreeRuns locallyOpen - Live
Grammar checker
Fix spelling, grammar and punctuation in any block of text with a free AI-powered grammar checker — no sign-up, nothing stored.
AIFreeOpen