glmv-caption

Official

Accurate captions for images, videos, and files

Authorzai-org
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Quickly obtain clear, detailed descriptions and summaries of visual and document content so users do not need to manually inspect or articulate the contents of images, videos, or files.

Core Features & Use Cases

  • Multimodal captioning: Generate natural-language captions for images, videos, and documents using the ZhiPu GLM-V API.
  • Flexible inputs: Accepts image URLs, local images encoded as base64, and file/video URLs with validation for formats and sizes.
  • Production-friendly output: Returns raw model outputs and token usage, supports streaming, custom prompts, and saves results to JSON for auditing.
  • Use Case: Content creators and accessibility teams can automatically produce image alt text, reporters can summarize video content, and researchers can extract visual summaries from batches of files.

Quick Start

Provide an image URL or upload an image and ask the skill to "Generate a detailed caption describing the visual content and salient elements."

Dependency Matrix

Required Modules

requests

Components

scripts

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: glmv-caption
Download link: https://github.com/zai-org/GLM-skills/archive/main.zip#glmv-caption

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.