glmv-grounding
OfficialLocate and visualize targets in images and videos
Software Engineering#visualization#grounding#coordinates#object-detection#visual-grounding#video-tracking#glm-v
Authorzai-org
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Extracts prompt-specified target locations from images and videos and converts model outputs into reliable, normalized coordinates and visual overlays so users can automatically localize, annotate, and inspect visual targets without manual pixel math.
Core Features & Use Cases
- Parse GLM-V grounding outputs into standard formats (2D bounding boxes, 2D points, polygons, 3D boxes, and video tracking JSON) normalized to the 0–1000 coordinate range.
- Visualize results on images and videos with configurable labeling, colors, and thickness, and reverse-normalize coordinates to pixel space for downstream use.
- Support video object tracking extraction and per-second MOT visualization, with URL safety checks and environment-based API configuration for production workflows.
- Use Case: Automatically find and label all instances of "people wearing red jackets" in a surveillance clip, return normalized coordinates per second, and produce an annotated video for review.
Quick Start
Run the grounding CLI with an image URL and a concise prompt to receive normalized 0-1000 coordinates and optional visualizations.
Dependency Matrix
Required Modules
requestsPillowopencv-pythonnumpymatplotlibdecord
Components
scripts
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: glmv-grounding Download link: https://github.com/zai-org/GLM-skills/archive/main.zip#glmv-grounding Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.