gstack-benchmark-models

Community

Pick the best model with real data.

Authoranilveersingh1308
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill removes guesswork when choosing which AI model is best for a given task by running the same prompt across providers and comparing measurable outcomes like latency and cost.

Core Features & Use Cases

  • Cross-model prompt shootouts: Executes the same prompt through Claude, GPT (via Codex CLI), and Gemini side-by-side to compare performance.
  • Actionable comparisons: Reports latency, token usage, and cost, and can optionally add an LLM judge score for output quality.
  • Safe preflight workflow: Forces an auth-aware dry-run first so you can see what providers are ready before any paid benchmark run.
  • Repeatable baselines: Can save benchmark results as JSON so you can compare future runs and detect performance drift.

Quick Start

Run gstack-benchmark-models and choose a prompt (or select a benchmark-able gstack skill) to compare Claude, GPT, and Gemini on the same input.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: gstack-benchmark-models
Download link: https://github.com/anilveersingh1308/copilot-skills/archive/main.zip#gstack-benchmark-models

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.