nvidia-vlm
CommunityAI-powered vision language model skill.
AuthorZenodia
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill empowers AI agents to understand and interpret visual information from images, bridging the gap between visual data and natural language processing.
Core Features & Use Cases
- Image Analysis: Analyze images to understand their content, scene, and context.
- Detailed Descriptions: Generate comprehensive descriptions of images, including objects, colors, and text.
- OCR: Extract text from images for data processing and accessibility.
- Visual Question Answering: Answer specific questions about image content.
- Use Case: Upload a photo of a street scene and ask the AI to identify all the vehicles and their colors, or to read any signs present.
Quick Start
Use the nvidia-vlm skill to analyze the image located at 'path/to/your/image.jpg'.
Dependency Matrix
Required Modules
openaiPyYAMLlangchainpydanticPillow
Components
scriptsassets
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: nvidia-vlm Download link: https://github.com/Zenodia/agentic-context-engineering-optimization/archive/main.zip#nvidia-vlm Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.