Key takeaways
- 01Two-stage pipeline: multi-pass neural denoising plus a voice-activity gate that silences non-speech frames.
- 02Controls for isolation strength and the number of passes let you tune between natural-sounding and hard isolation.
- 03Works best when the voice is louder than the background music or crowd noise.
- 04Output is a 48 kHz mono WAV; nothing is uploaded to any server.
When You Need More Than Noise Reduction
Standard noise reduction handles steady background hiss and hum. But what about a podcast guest recorded in a busy cafe, an interview done over a music bed, or a speech filmed at a crowded event? When the background is loud, varied, or musical, a single denoise pass isn't enough — you need a system that can also identify which parts of the audio are speech and silence everything else.
Handytool's voice isolator runs a two-stage pipeline: multiple passes of RNNoise neural denoising to tighten the noise floor, followed by a voice-activity-driven gate that suppresses frames the model identifies as non-speech. The result is a track where silence replaces the background between phrases, rather than a quieter version of the original noise. The whole process runs locally in your browser — no upload, no account needed.
How to Isolate a Voice From Background Noise
- 01
Drop your audio file
Drag an MP3, WAV, M4A, OGG, or FLAC file into the tool. Up to 200 MB is accepted.
- 02
Set isolation strength
Strength controls how aggressively non-voice frames are gated. Start at 70–80 for podcasts or interviews; push to 90–100 to strip a music bed or crowd noise.
- 03
Choose the number of passes
Each additional pass of neural denoising tightens the noise floor. One pass works for lightly noisy recordings; two or three passes improve results when background noise is loud or mixed.
- 04
Click Isolate and download
The pipeline runs locally in your browser. When it finishes, download the isolated voice as a 48 kHz mono WAV.
Recordings That Benefit Most From Voice Isolation
- 01Podcast guests recorded in cafes or restaurants
- 02Interviews filmed at conferences or events with crowd noise
- 03Speeches or presentations with a music bed underneath
- 04Field recordings from outdoors with wind and traffic
- 05Phone or video call recordings with noisy environments on one end
Your Audio Is Processed Locally, Not on a Server
The isolation pipeline is a 125 KB WebAssembly module loaded once in your browser. When you drop a file in, it is decoded and processed entirely on your own machine. No audio is streamed to a server, no account is created, and nothing is retained after you close the tab.
Processing time depends on the number of passes and file length. Two passes on a 10-minute file take roughly two to three minutes on a modern laptop. Files up to 200 MB are accepted.
Voice Isolator FAQ
How do I remove background music from a voice recording?
Drop your file into the Voice Isolator, set strength to 90–100, choose two or three passes, and click Isolate. The gate silences non-speech frames; the denoiser pulls down music bleeding through during words.
How is this different from the Voice Enhancer?
Voice Enhancer does a single denoise pass for a natural-feeling cleanup of steady noise. Voice Isolator stacks multiple passes and adds a voice-activity gate that silences anything outside speech — better for music, crowds, and varied noise.
What does the isolation strength slider do?
It sets how aggressively non-voice frames are attenuated. At 0 the gate is loose; at 100 anything the model isn't confident is voice goes to silence. 70–80 is a good starting point for podcasts, 90–100 for music or crowd removal.
Is my audio uploaded to a server?
No. The pipeline is a WebAssembly module that runs locally on your CPU. Nothing leaves your computer.
What output format do I get?
A mono 48 kHz WAV in 16-bit PCM. Use the Convert audio tool to export as MP3 if you need a smaller file.
How long can the recording be?
Up to 200 MB. Two passes process at roughly 3–5 times real-time on a modern laptop, so a 10-minute recording isolates in two to three minutes.