glmv-grounding

Official

Locate and visualize targets in images and videos

Authorzai-org
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Extracts prompt-specified target locations from images and videos and converts model outputs into reliable, normalized coordinates and visual overlays so users can automatically localize, annotate, and inspect visual targets without manual pixel math.

Core Features & Use Cases

  • Parse GLM-V grounding outputs into standard formats (2D bounding boxes, 2D points, polygons, 3D boxes, and video tracking JSON) normalized to the 0–1000 coordinate range.
  • Visualize results on images and videos with configurable labeling, colors, and thickness, and reverse-normalize coordinates to pixel space for downstream use.
  • Support video object tracking extraction and per-second MOT visualization, with URL safety checks and environment-based API configuration for production workflows.
  • Use Case: Automatically find and label all instances of "people wearing red jackets" in a surveillance clip, return normalized coordinates per second, and produce an annotated video for review.

Quick Start

Run the grounding CLI with an image URL and a concise prompt to receive normalized 0-1000 coordinates and optional visualizations.

Dependency Matrix

Required Modules

requestsPillowopencv-pythonnumpymatplotlibdecord

Components

scripts

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: glmv-grounding
Download link: https://github.com/zai-org/GLM-skills/archive/main.zip#glmv-grounding

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.