bm25-tokenizer-guide

Official

Guides BM25 tokenizer choices for fast search.

AuthorMercurium-Analytics
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Tokenizer selection for BM25 indexes is critical to balance search accuracy, index size, and performance. This guide helps you decide per-field tokenizers to optimize autocomplete for names and efficient full-text search for prose or code.

Core Features & Use Cases

  • Provides a decision-tree approach for choosing among default, ngram prefix (autocomplete), ngram substring, and code tokenizers.
  • Maps per-field strategies (names, descriptions, filenames) to practical index configurations and explains trade-offs in index size and hit quality.
  • Use Case: design a hybrid search in a catalog where product names need fast suggestions while descriptions deliver robust phrase matching.

Quick Start

Use the decision tree to configure per-field tokenizers and build your BM25 index accordingly.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: bm25-tokenizer-guide
Download link: https://github.com/Mercurium-Analytics/pg-search-vector/archive/main.zip#bm25-tokenizer-guide

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.