voice_pipeline

Name: voice_pipeline
Availability: InStock
Author: antonyfmunoz

Community

Turn messy audio into Whisper-ready input.

Software Engineering #speech-to-text #librosa #voice activity detection #audio preprocessing #Discord voice #silero vad #webrtcvad

Authorantonyfmunoz

Version1.0.0

Installs0

System Documentation

What problem does it solve?

This Skill solves unreliable speech detection and noisy audio preprocessing by transforming Discord voice PCM into clean, VAD-filtered, Whisper-ready audio using a two-tier approach with deterministic and neural fallbacks.

Core Features & Use Cases

Two-tier Voice Activity Detection: Uses Silero VAD as the primary neural confidence-based detector and webrtcvad as a deterministic fallback when the neural path fails or is unavailable.
Noise/Music Gating for Better STT: Applies librosa spectral analysis (spectral flatness and zero-crossing rate) to suppress background music and reduce Whisper hallucinations.
Correct Audio Buffer Normalization: Uses numpy to reliably convert int16 PCM bytes → float32 normalized [-1, 1] and to maintain exact frame slicing requirements for VAD.
Discord-to-Transcription Integration: Designed for EOS Discord voice capture flows, including per-user buffering and silence-threshold flushing for utterance segmentation.

Use Cases: streaming Discord voice capture, tuning real-time transcription quality, improving robustness in noisy environments or music-heavy channels, and building or modifying audio buffers before downstream STT.

Quick Start

Use the voice_pipeline Skill to implement Discord audio capture preprocessing, apply two-tier VAD with music filtering, and output WAV/segments suitable for your Whisper STT stage.

voice_pipeline

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper