claim-dedup-and-compact
CommunityPrevent duplicate claims and compact storage
System Documentation
What problem does it solve?
It solves the problem of repeated or duplicate claim records accumulating in an append-only JSONL store, which bloats datasets and makes downstream processing slower and less reliable.
Core Features & Use Cases
- Per-source write-time dedup: Removes duplicates during the Store phase by checking content hashes only within the target by-source file, keeping the hot path fast.
- On-demand global compaction: Deduplicates and rewrites the master and all derived indexes (by-topic and by-source) using a deterministic pattern that preserves the oldest claim.
- Atomic Windows-safe rewrite with backups: Performs atomic-ish write using temp files plus backup copies (with .bak.<timestamp>) to reduce corruption risk, and supports DryRun to compute effects without touching files.
- Debuggable response contract: Returns fields like ExtractedCount and DedupSkipped so you can assert dedup behavior in CI or during operational checks.
Use case: Your LLM extracts the same factual claim multiple times from the same evidence source; you want to ensure you store it once, and later compact the whole claim store so old duplicates are removed while keeping indices consistent.
Quick Start
Ask your system to run claim store compaction in DryRun mode first to estimate BeforeCount and AfterCount, then run the real compact once you confirm the removal count is acceptable.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: claim-dedup-and-compact Download link: https://github.com/transreal/claudecode/archive/main.zip#claim-dedup-and-compact Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.