Name: gstack-benchmark-models
Availability: InStock
Author: anilveersingh1308

System Documentation

What problem does it solve?

This Skill removes guesswork when choosing which AI model is best for a given task by running the same prompt across providers and comparing measurable outcomes like latency and cost.

Core Features & Use Cases

Cross-model prompt shootouts: Executes the same prompt through Claude, GPT (via Codex CLI), and Gemini side-by-side to compare performance.
Actionable comparisons: Reports latency, token usage, and cost, and can optionally add an LLM judge score for output quality.
Safe preflight workflow: Forces an auth-aware dry-run first so you can see what providers are ready before any paid benchmark run.
Repeatable baselines: Can save benchmark results as JSON so you can compare future runs and detect performance drift.

Quick Start

Run gstack-benchmark-models and choose a prompt (or select a benchmark-able gstack skill) to compare Claude, GPT, and Gemini on the same input.

Please help me install this Skill: Name: gstack-benchmark-models Download link: https://github.com/anilveersingh1308/copilot-skills/archive/main.zip#gstack-benchmark-models Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

gstack-benchmark-models

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper