vision-mcp

Name: vision-mcp
Availability: InStock
Author: Haruhiyuki

Community

Speed up desktop automation with reusable maps

Software Engineering #cross-platform #workflows #maps #patching #desktop-automation #vision-mcp #GUI-automation

AuthorHaruhiyuki

Version1.0.0

Installs0

System Documentation

What problem does it solve?

Vision-MCP reduces the heavy, repetitive cost of desktop GUI tasks by turning each interaction path (screenshots, coordinate estimation, AX/OCR, and click sequences) into a reusable vision-mcp.yaml map that can be replayed with run_workflow. It enables agents to skip exhaustive visual exploration on subsequent runs, cutting initial setup time from minutes to seconds. The approach assumes the agent can perform visual operations and uses a modular map as an amortized automation layer rather than a replacement for existing computer-use skills.

Core Features & Use Cases

Path-to-map conversion: every visual operation path is captured and distilled into a reusable map.
Workflow reuse: after the first run, subsequent tasks hit run_workflow with near-zero visual cost.
Cross-platform support: macOS and Windows compatibility with platform-specific handling and safety checks.
Patchable maps: supports patches and repair to keep maps reliable as UI changes.
Guided exploration vs task-driven operation: supports both exploration-driven map-building and task-driven execution.

Quick Start

Run vision-mcp workflow with an existing map to execute a desktop task end-to-end.

vision-mcp

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper