voice_pipeline
CommunityTurn messy audio into Whisper-ready input.
System Documentation
What problem does it solve?
This Skill solves unreliable speech detection and noisy audio preprocessing by transforming Discord voice PCM into clean, VAD-filtered, Whisper-ready audio using a two-tier approach with deterministic and neural fallbacks.
Core Features & Use Cases
- Two-tier Voice Activity Detection: Uses Silero VAD as the primary neural confidence-based detector and webrtcvad as a deterministic fallback when the neural path fails or is unavailable.
- Noise/Music Gating for Better STT: Applies librosa spectral analysis (spectral flatness and zero-crossing rate) to suppress background music and reduce Whisper hallucinations.
- Correct Audio Buffer Normalization: Uses numpy to reliably convert
int16 PCM bytes → float32 normalized [-1, 1]and to maintain exact frame slicing requirements for VAD. - Discord-to-Transcription Integration: Designed for EOS Discord voice capture flows, including per-user buffering and silence-threshold flushing for utterance segmentation.
Use Cases: streaming Discord voice capture, tuning real-time transcription quality, improving robustness in noisy environments or music-heavy channels, and building or modifying audio buffers before downstream STT.
Quick Start
Use the voice_pipeline Skill to implement Discord audio capture preprocessing, apply two-tier VAD with music filtering, and output WAV/segments suitable for your Whisper STT stage.
Dependency Matrix
Required Modules
None requiredComponents
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: voice_pipeline Download link: https://github.com/antonyfmunoz/OS/archive/main.zip#voice-pipeline Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.