Name: multimodal-llm
Availability: InStock
Author: yonatangross

System Documentation

What problem does it solve?

This Skill enables seamless integration of advanced multimodal AI capabilities, allowing you to process images, transcribe audio, generate speech, and create AI-generated video content.

Core Features & Use Cases

Image Analysis: Understand and describe images, extract data from documents and charts.
Audio Processing: Transcribe speech to text, generate natural-sounding speech from text.
Video Generation: Create AI-powered videos using cutting-edge models like Kling, Sora, and Veo.
Use Case: Build an AI assistant that can describe images uploaded by users, transcribe meeting recordings, and generate short promotional videos for products.

Quick Start

Use the multimodal-llm skill to describe the provided image and generate a short video based on a text prompt.

Please help me install this Skill: Name: multimodal-llm Download link: https://github.com/yonatangross/orchestkit/archive/main.zip#multimodal-llm Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

multimodal-llm

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper