multimodal-llm

Community

Unlock vision, audio, and video AI capabilities.

Authoryonatangross
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill enables seamless integration of advanced multimodal AI capabilities, allowing you to process images, transcribe audio, generate speech, and create AI-generated video content.

Core Features & Use Cases

  • Image Analysis: Understand and describe images, extract data from documents and charts.
  • Audio Processing: Transcribe speech to text, generate natural-sounding speech from text.
  • Video Generation: Create AI-powered videos using cutting-edge models like Kling, Sora, and Veo.
  • Use Case: Build an AI assistant that can describe images uploaded by users, transcribe meeting recordings, and generate short promotional videos for products.

Quick Start

Use the multimodal-llm skill to describe the provided image and generate a short video based on a text prompt.

Dependency Matrix

Required Modules

None required

Components

references

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: multimodal-llm
Download link: https://github.com/yonatangross/orchestkit/archive/main.zip#multimodal-llm

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.