qwencloud-vision
OfficialTurn images and videos into actionable insights.
AuthorQwenCloud
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Analyze images and videos using Qwen VL and VL-OCR models to understand scenes, extract text, answer questions, and produce structured outputs for automation and agents.
Core Features & Use Cases
- Image and video understanding (including thinking-mode support) for descriptions, Q&A, and reasoning.
- OCR text extraction with structured data outputs and language support.
- Multi-image comparison and visual reasoning for charts, scenes, and visual problems.
- JSON Schema or JSON object outputs for easy integration with pipelines and agents.
Quick Start
Describe an image or video by running python scripts/analyze.py with a prompt and the media file.
Dependency Matrix
Required Modules
alibabacloud-oss-v2
Components
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: qwencloud-vision Download link: https://github.com/QwenCloud/qwencloud-ai/archive/main.zip#qwencloud-vision Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.