42-near-duplicates

Community

Identify near-duplicates and semantic similarity.

Authorchapter42
Version1.0.0
Installs0

System Documentation

## What problem does it solve? Detects near-duplicate and semantically similar pages to prevent content cannibalization across a site, combining multiple signals for robust conclusions.

## Core Features & Use Cases

  • Three-layer approach: Exact duplicates via MD5, near duplicates via MinHash, and semantic similarity via embeddings to surface content cannibalization.
  • Component-aware analysis: Evaluates title, H1, meta, and body differences to distinguish canonical pages from related but distinct content.
  • Workflow-ready outputs: Produces a NEAR-DUPLICATES report and cluster details to guide canonicalization, merging, or differentiation.

### Quick Start Analyze a Screaming Frog export directory with the /42:near-duplicates command to generate a NEAR-DUPLICATES.md report.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: 42-near-duplicates
Download link: https://github.com/chapter42/SEO-Skills-42/archive/main.zip#42-near-duplicates

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.