qwencloud-vision

Official

Turn images and videos into actionable insights.

AuthorQwenCloud
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Analyze images and videos using Qwen VL and VL-OCR models to understand scenes, extract text, answer questions, and produce structured outputs for automation and agents.

Core Features & Use Cases

  • Image and video understanding (including thinking-mode support) for descriptions, Q&A, and reasoning.
  • OCR text extraction with structured data outputs and language support.
  • Multi-image comparison and visual reasoning for charts, scenes, and visual problems.
  • JSON Schema or JSON object outputs for easy integration with pipelines and agents.

Quick Start

Describe an image or video by running python scripts/analyze.py with a prompt and the media file.

Dependency Matrix

Required Modules

alibabacloud-oss-v2

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: qwencloud-vision
Download link: https://github.com/QwenCloud/qwencloud-ai/archive/main.zip#qwencloud-vision

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.