podcast-transcript-txt
CommunityExport podcast transcripts as cleaned TXT.
AuthorKingJing1
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Turning podcast episodes into searchable transcripts is slow and inconsistent, especially when official transcripts are missing. This Skill reliably extracts transcript text (or falls back to subtitles or local ASR) and writes a cleaned TXT plus a .meta.json for debugging.
Core Features & Use Cases
- Deterministic transcript resolution: prioritizes official transcript sources, then YouTube subtitles, then episode/page visible text, and finally local ASR with selectable small|medium models.
- Multiple input types: handles YouTube URLs/IDs, episode webpages (including Xiaoyuzhou), official transcript URLs/files (TTML/JSON), direct audio URLs, Apple Podcasts title search, X/Twitter links (best-effort), and plain episode titles.
- Observable output quality: always emits a matching .meta.json containing resolver path and quality/attempt diagnostics so you know whether the result is official/subtitle/page-text/ASR-derived.
Quick Start
Run python3 scripts/podcast_transcript_txt.py --input "https://www.youtube.com/watch?v=n1E9IZfvGMA" --out-dir "/tmp/transcripts" and then use the generated TXT and .meta.json.
Dependency Matrix
Required Modules
yt-dlpfaster-whisper
Components
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: podcast-transcript-txt Download link: https://github.com/KingJing1/podcast-transcript-txt-skill/archive/main.zip#podcast-transcript-txt Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.