gsm8k-eval
CommunityGSM8K evaluation protocol and answer extraction.
AuthorJoaquinCampo
Version1.0.0
Installs0
System Documentation
What problem does it solve?
GSM8K evaluation protocol and answer extraction standardizes how to score model reasoning on math problems by focusing on the final numeric answer and consistent ground-truth comparison.
Core Features & Use Cases
- Defines extraction rules for final answers across common GSM8K formats (####, The answer is, Answer:, and last-number patterns) and robust normalization to ensure fair comparisons.
- Provides guidance for dataset loading, ground-truth extraction, and accuracy scoring to enable reproducible research and benchmarking.
- Useful for research teams evaluating language models on math-word problems, verifying prompts, and comparing decoding strategies.
Quick Start
Run the GSM8K evaluation workflow using the official ground-truth and the multi-pattern extractor to score model outputs.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: gsm8k-eval Download link: https://github.com/JoaquinCampo/Skills/archive/main.zip#gsm8k-eval Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.