arabic-token-eval

Community

Evaluate Arabic tokenizers and morphology for NLP tasks.

AuthorchabirOael
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill assesses the effectiveness of various Arabic tokenization approaches, including morphological preservation and segmentation quality, to improve NLP model performance.

Core Features & Use Cases

  • Tokenizer Evaluation: Compares subword, character, and morphology-aware tokenizers based on morphology metrics and downstream task results.
  • Intrinsic & Morphological Metrics: Calculates root, pattern, morpheme integrity, clitic separation, and fragmentation ratios to analyze tokenization quality.
  • Use Case: Researchers can use this Skill to select optimal tokenization strategies for Arabic NLP models, ensuring accurate root extraction and minimal fragmentation.

Quick Start

Use the arabic-token-eval skill to evaluate a new tokenizer on sample Arabic texts and examine morphology scores.

Dependency Matrix

Required Modules

qalsadifarasapypyarabicdisambig-mle-calima-msa-r13

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: arabic-token-eval
Download link: https://github.com/chabirOael/tokenizers_evaluation/archive/main.zip#arabic-token-eval

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.