meow:multimodal

Community

Multimodal analysis and generation with Gemini.

Authorngocsangyem
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Gemini-based multimodal analysis and generation is integrated into MeowKit to handle images, videos, audio, and documents, enabling automated understanding, transcription, OCR, and media creation where Claude Code would otherwise struggle with binary content.

Core Features & Use Cases

  • Multimodal analysis: analyze, describe, OCR, and extract information from images, videos, audio, PDFs, and other documents.
  • Transcription & OCR: automatic transcription of audio/video and OCR for images and scanned documents.
  • Data extraction: extract structured content from documents (tables, text) and produce Markdown or structured outputs.
  • Media generation: generate images (Imagen) and videos (Veo) from prompts for design, mockups, or visual storytelling.
  • Auto-activation: activates automatically on file references or prompts like "analyze", "describe", "transcribe", "extract from" or "generate image/video".
  • Security & env: requires GEMINI_API_KEY and runs with a MeowKit security anchor system to bound outputs within policy.

Quick Start

Provide a media file path and a task to run multimodal analysis or generation.

Dependency Matrix

Required Modules

google-genaipython-dotenv

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: meow:multimodal
Download link: https://github.com/ngocsangyem/MeowKit/archive/main.zip#meow-multimodal

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.