glm-vision
CommunityAnalyze images with GLM-4.6V multimodal vision.
Authorarchibate
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Provides reliable visual understanding for user-submitted images by producing natural-language descriptions, extracting embedded text (OCR), identifying visual elements, and comparing multiple images to surface differences and semantics, which removes the need for manual inspection.
Core Features & Use Cases
- Image Description: Generate concise and detailed natural-language descriptions for photos, screenshots, and diagrams.
- OCR Text Extraction: Extract and preserve textual content from images for search, translation, or copyable output.
- Image Comparison & Analysis: Compare multiple images to highlight differences, similar objects, or layout changes; supports basic video/frame input.
- Use Case: Quickly analyze a screenshot to summarize UI elements and extract any visible text for documentation or bug reports.
Quick Start
Please analyze the attached image, describe its main contents, extract any visible text, and list the key objects and colors present.
Dependency Matrix
Required Modules
openai
Components
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: glm-vision Download link: https://github.com/archibate/archibate-skills/archive/main.zip#glm-vision Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.