Video Whisper — Local Video/Audio Transcription
CommunityTranscribe videos to text on your Mac.
System Documentation
What problem does it solve?
This Skill eliminates the hassle of manually transcribing videos and audio by generating accurate local transcripts you can read, search, and summarize without relying on cloud services.
Core Features & Use Cases
- Local, Apple Silicon-friendly transcription: Uses MLX Whisper to run entirely on-device on M1/M2/M3/M4 Macs.
- Works for URLs and local media: Extracts audio from local files and supports major sites (YouTube, Bilibili, Xiaohongshu, Douyin) plus podcasts and other yt-dlp supported sources.
- Timestamped output for downstream analysis: Produces both plain text and JSON containing per-segment timings and detected language to help with indexing, QA, and summarization.
Use case: Convert a long podcast or creator video into a searchable transcript with timestamps so you can quickly find key moments and generate summaries or notes.
Quick Start
Install yt-dlp and ffmpeg via Homebrew, create a Python venv and install mlx-whisper, then run: bash scripts/transcribe.sh "https://www.youtube.com/watch?v=dQw4w9WgXcQ".
Dependency Matrix
Required Modules
Components
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: Video Whisper — Local Video/Audio Transcription Download link: https://github.com/ylongw/video-whisper/archive/main.zip#video-whisper-local-video-audio-transcription Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.