gsm8k-eval

Community

GSM8K evaluation protocol and answer extraction.

AuthorJoaquinCampo
Version1.0.0
Installs0

System Documentation

What problem does it solve?

GSM8K evaluation protocol and answer extraction standardizes how to score model reasoning on math problems by focusing on the final numeric answer and consistent ground-truth comparison.

Core Features & Use Cases

  • Defines extraction rules for final answers across common GSM8K formats (####, The answer is, Answer:, and last-number patterns) and robust normalization to ensure fair comparisons.
  • Provides guidance for dataset loading, ground-truth extraction, and accuracy scoring to enable reproducible research and benchmarking.
  • Useful for research teams evaluating language models on math-word problems, verifying prompts, and comparing decoding strategies.

Quick Start

Run the GSM8K evaluation workflow using the official ground-truth and the multi-pattern extractor to score model outputs.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: gsm8k-eval
Download link: https://github.com/JoaquinCampo/Skills/archive/main.zip#gsm8k-eval

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.