vision-mcp

Community

Speed up desktop automation with reusable maps

AuthorHaruhiyuki
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Vision-MCP reduces the heavy, repetitive cost of desktop GUI tasks by turning each interaction path (screenshots, coordinate estimation, AX/OCR, and click sequences) into a reusable vision-mcp.yaml map that can be replayed with run_workflow. It enables agents to skip exhaustive visual exploration on subsequent runs, cutting initial setup time from minutes to seconds. The approach assumes the agent can perform visual operations and uses a modular map as an amortized automation layer rather than a replacement for existing computer-use skills.

Core Features & Use Cases

  • Path-to-map conversion: every visual operation path is captured and distilled into a reusable map.
  • Workflow reuse: after the first run, subsequent tasks hit run_workflow with near-zero visual cost.
  • Cross-platform support: macOS and Windows compatibility with platform-specific handling and safety checks.
  • Patchable maps: supports patches and repair to keep maps reliable as UI changes.
  • Guided exploration vs task-driven operation: supports both exploration-driven map-building and task-driven execution.

Quick Start

Run vision-mcp workflow with an existing map to execute a desktop task end-to-end.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: vision-mcp
Download link: https://github.com/Haruhiyuki/vision-mcp/archive/main.zip#vision-mcp

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 510,000+ vetted skills library on demand.