podcast-transcript-txt

Community

Export podcast transcripts as cleaned TXT.

AuthorKingJing1
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Turning podcast episodes into searchable transcripts is slow and inconsistent, especially when official transcripts are missing. This Skill reliably extracts transcript text (or falls back to subtitles or local ASR) and writes a cleaned TXT plus a .meta.json for debugging.

Core Features & Use Cases

  • Deterministic transcript resolution: prioritizes official transcript sources, then YouTube subtitles, then episode/page visible text, and finally local ASR with selectable small|medium models.
  • Multiple input types: handles YouTube URLs/IDs, episode webpages (including Xiaoyuzhou), official transcript URLs/files (TTML/JSON), direct audio URLs, Apple Podcasts title search, X/Twitter links (best-effort), and plain episode titles.
  • Observable output quality: always emits a matching .meta.json containing resolver path and quality/attempt diagnostics so you know whether the result is official/subtitle/page-text/ASR-derived.

Quick Start

Run python3 scripts/podcast_transcript_txt.py --input "https://www.youtube.com/watch?v=n1E9IZfvGMA" --out-dir "/tmp/transcripts" and then use the generated TXT and .meta.json.

Dependency Matrix

Required Modules

yt-dlpfaster-whisper

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: podcast-transcript-txt
Download link: https://github.com/KingJing1/podcast-transcript-txt-skill/archive/main.zip#podcast-transcript-txt

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.