podcast-transcript-txt

Name: podcast-transcript-txt
Availability: InStock
Author: KingJing1

Community

Export podcast transcripts as cleaned TXT.

Content & Communication #transcript #web scraping #youtube #podcast #asr #txt export #quality diagnostics

AuthorKingJing1

Version1.0.0

Installs0

System Documentation

What problem does it solve?

Turning podcast episodes into searchable transcripts is slow and inconsistent, especially when official transcripts are missing. This Skill reliably extracts transcript text (or falls back to subtitles or local ASR) and writes a cleaned TXT plus a .meta.json for debugging.

Core Features & Use Cases

Deterministic transcript resolution: prioritizes official transcript sources, then YouTube subtitles, then episode/page visible text, and finally local ASR with selectable small|medium models.
Multiple input types: handles YouTube URLs/IDs, episode webpages (including Xiaoyuzhou), official transcript URLs/files (TTML/JSON), direct audio URLs, Apple Podcasts title search, X/Twitter links (best-effort), and plain episode titles.
Observable output quality: always emits a matching .meta.json containing resolver path and quality/attempt diagnostics so you know whether the result is official/subtitle/page-text/ASR-derived.

Quick Start

Run python3 scripts/podcast_transcript_txt.py --input "https://www.youtube.com/watch?v=n1E9IZfvGMA" --out-dir "/tmp/transcripts" and then use the generated TXT and .meta.json.

podcast-transcript-txt

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper