ingest-web

Community

Convert web articles into clean markdown

AuthorRonanCodes
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Web articles and blog posts often contain cluttered HTML, missing metadata, remote image links, or are difficult to import cleanly into a markdown-based wiki; this Skill automates extracting readable content and packaging it for ingestion into a vault.

Core Features & Use Cases

  • Readable extraction: Fetches a URL and extracts the article title, author, published date, and main body while stripping navigation, sidebars, and ads.
  • HTML-to-markdown conversion: Preserves headings, lists, blockquotes, links, code blocks, and image references when converting to clean markdown.
  • Image and asset handling: Downloads referenced images into vault/raw/assets, replaces remote URLs with local paths, and records images-downloaded in the file frontmatter.
  • Metadata-first output: Writes a YAML frontmatter including source-url, title, author, date-fetched, and images-downloaded, saving results to raw/<descriptive-slug>.md for downstream wiki workflows.
  • Fallback guidance: Notes when extraction is likely to fail (heavy JS/SPAs) and recommends using a browser clipper for better fidelity.

Quick Start

Ingest the article at https://example.com/article into vault my-research to extract content, download images, and save a markdown file in raw/.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: ingest-web
Download link: https://github.com/RonanCodes/llm-wiki/archive/main.zip#ingest-web

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.