arabic-token-eval
CommunityEvaluate Arabic tokenizers and morphology for NLP tasks.
AuthorchabirOael
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill assesses the effectiveness of various Arabic tokenization approaches, including morphological preservation and segmentation quality, to improve NLP model performance.
Core Features & Use Cases
- Tokenizer Evaluation: Compares subword, character, and morphology-aware tokenizers based on morphology metrics and downstream task results.
- Intrinsic & Morphological Metrics: Calculates root, pattern, morpheme integrity, clitic separation, and fragmentation ratios to analyze tokenization quality.
- Use Case: Researchers can use this Skill to select optimal tokenization strategies for Arabic NLP models, ensuring accurate root extraction and minimal fragmentation.
Quick Start
Use the arabic-token-eval skill to evaluate a new tokenizer on sample Arabic texts and examine morphology scores.
Dependency Matrix
Required Modules
qalsadifarasapypyarabicdisambig-mle-calima-msa-r13
Components
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: arabic-token-eval Download link: https://github.com/chabirOael/tokenizers_evaluation/archive/main.zip#arabic-token-eval Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.