Video Whisper — Local Video/Audio Transcription

Community

Transcribe videos to text on your Mac.

Authorylongw
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill eliminates the hassle of manually transcribing videos and audio by generating accurate local transcripts you can read, search, and summarize without relying on cloud services.

Core Features & Use Cases

  • Local, Apple Silicon-friendly transcription: Uses MLX Whisper to run entirely on-device on M1/M2/M3/M4 Macs.
  • Works for URLs and local media: Extracts audio from local files and supports major sites (YouTube, Bilibili, Xiaohongshu, Douyin) plus podcasts and other yt-dlp supported sources.
  • Timestamped output for downstream analysis: Produces both plain text and JSON containing per-segment timings and detected language to help with indexing, QA, and summarization.

Use case: Convert a long podcast or creator video into a searchable transcript with timestamps so you can quickly find key moments and generate summaries or notes.

Quick Start

Install yt-dlp and ffmpeg via Homebrew, create a Python venv and install mlx-whisper, then run: bash scripts/transcribe.sh "https://www.youtube.com/watch?v=dQw4w9WgXcQ".

Dependency Matrix

Required Modules

yt-dlpffmpeg

Components

scripts

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: Video Whisper — Local Video/Audio Transcription
Download link: https://github.com/ylongw/video-whisper/archive/main.zip#video-whisper-local-video-audio-transcription

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.