evaluate-cortex-agent
CommunityBenchmark Cortex Agents with Snowflake Eval
Authorrandoneering
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill provides a structured, repeatable workflow to evaluate Cortex Agents using Snowflake’s native Agent Evaluations, enabling objective benchmarking and comparison of agent performance across configurations.
Core Features & Use Cases
- Define evaluation datasets for Cortex Agents and track metrics such as correctness, tool_selection_accuracy, tool_execution_accuracy, and logical_consistency.
- Automate setup of evaluation runs in Snowflake and generate Snowsight reports.
- Support scenario-based comparisons to measure improvements after prompts, tool changes, or configuration updates.
Quick Start
Configure the target agent, select metrics, build or choose a dataset, run the evaluation, and review results in Snowsight.
Dependency Matrix
Required Modules
None requiredComponents
scripts
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: evaluate-cortex-agent Download link: https://github.com/randoneering/nix-flake-mirror/archive/main.zip#evaluate-cortex-agent Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.