vision-mcp
CommunitySpeed up desktop automation with reusable maps
System Documentation
What problem does it solve?
Vision-MCP reduces the heavy, repetitive cost of desktop GUI tasks by turning each interaction path (screenshots, coordinate estimation, AX/OCR, and click sequences) into a reusable vision-mcp.yaml map that can be replayed with run_workflow. It enables agents to skip exhaustive visual exploration on subsequent runs, cutting initial setup time from minutes to seconds. The approach assumes the agent can perform visual operations and uses a modular map as an amortized automation layer rather than a replacement for existing computer-use skills.
Core Features & Use Cases
- Path-to-map conversion: every visual operation path is captured and distilled into a reusable map.
- Workflow reuse: after the first run, subsequent tasks hit run_workflow with near-zero visual cost.
- Cross-platform support: macOS and Windows compatibility with platform-specific handling and safety checks.
- Patchable maps: supports patches and repair to keep maps reliable as UI changes.
- Guided exploration vs task-driven operation: supports both exploration-driven map-building and task-driven execution.
Quick Start
Run vision-mcp workflow with an existing map to execute a desktop task end-to-end.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: vision-mcp Download link: https://github.com/Haruhiyuki/vision-mcp/archive/main.zip#vision-mcp Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 510,000+ vetted skills library on demand.