glm-vision

Community

Analyze images with GLM-4.6V multimodal vision.

Authorarchibate
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Provides reliable visual understanding for user-submitted images by producing natural-language descriptions, extracting embedded text (OCR), identifying visual elements, and comparing multiple images to surface differences and semantics, which removes the need for manual inspection.

Core Features & Use Cases

  • Image Description: Generate concise and detailed natural-language descriptions for photos, screenshots, and diagrams.
  • OCR Text Extraction: Extract and preserve textual content from images for search, translation, or copyable output.
  • Image Comparison & Analysis: Compare multiple images to highlight differences, similar objects, or layout changes; supports basic video/frame input.
  • Use Case: Quickly analyze a screenshot to summarize UI elements and extract any visible text for documentation or bug reports.

Quick Start

Please analyze the attached image, describe its main contents, extract any visible text, and list the key objects and colors present.

Dependency Matrix

Required Modules

openai

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: glm-vision
Download link: https://github.com/archibate/archibate-skills/archive/main.zip#glm-vision

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.