bm25-tokenizer-guide
OfficialGuides BM25 tokenizer choices for fast search.
AuthorMercurium-Analytics
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Tokenizer selection for BM25 indexes is critical to balance search accuracy, index size, and performance. This guide helps you decide per-field tokenizers to optimize autocomplete for names and efficient full-text search for prose or code.
Core Features & Use Cases
- Provides a decision-tree approach for choosing among default, ngram prefix (autocomplete), ngram substring, and code tokenizers.
- Maps per-field strategies (names, descriptions, filenames) to practical index configurations and explains trade-offs in index size and hit quality.
- Use Case: design a hybrid search in a catalog where product names need fast suggestions while descriptions deliver robust phrase matching.
Quick Start
Use the decision tree to configure per-field tokenizers and build your BM25 index accordingly.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: bm25-tokenizer-guide Download link: https://github.com/Mercurium-Analytics/pg-search-vector/archive/main.zip#bm25-tokenizer-guide Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.