42-near-duplicates
CommunityIdentify near-duplicates and semantic similarity.
Authorchapter42
Version1.0.0
Installs0
System Documentation
## What problem does it solve? Detects near-duplicate and semantically similar pages to prevent content cannibalization across a site, combining multiple signals for robust conclusions.
## Core Features & Use Cases
- Three-layer approach: Exact duplicates via MD5, near duplicates via MinHash, and semantic similarity via embeddings to surface content cannibalization.
- Component-aware analysis: Evaluates title, H1, meta, and body differences to distinguish canonical pages from related but distinct content.
- Workflow-ready outputs: Produces a NEAR-DUPLICATES report and cluster details to guide canonicalization, merging, or differentiation.
### Quick Start Analyze a Screaming Frog export directory with the /42:near-duplicates command to generate a NEAR-DUPLICATES.md report.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: 42-near-duplicates Download link: https://github.com/chapter42/SEO-Skills-42/archive/main.zip#42-near-duplicates Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.