meow:multimodal
CommunityMultimodal analysis and generation with Gemini.
Software Engineering#ocr#multimodal#transcription#gemini#image-analysis#video-generation#pdf-extraction
Authorngocsangyem
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Gemini-based multimodal analysis and generation is integrated into MeowKit to handle images, videos, audio, and documents, enabling automated understanding, transcription, OCR, and media creation where Claude Code would otherwise struggle with binary content.
Core Features & Use Cases
- Multimodal analysis: analyze, describe, OCR, and extract information from images, videos, audio, PDFs, and other documents.
- Transcription & OCR: automatic transcription of audio/video and OCR for images and scanned documents.
- Data extraction: extract structured content from documents (tables, text) and produce Markdown or structured outputs.
- Media generation: generate images (Imagen) and videos (Veo) from prompts for design, mockups, or visual storytelling.
- Auto-activation: activates automatically on file references or prompts like "analyze", "describe", "transcribe", "extract from" or "generate image/video".
- Security & env: requires GEMINI_API_KEY and runs with a MeowKit security anchor system to bound outputs within policy.
Quick Start
Provide a media file path and a task to run multimodal analysis or generation.
Dependency Matrix
Required Modules
google-genaipython-dotenv
Components
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: meow:multimodal Download link: https://github.com/ngocsangyem/MeowKit/archive/main.zip#meow-multimodal Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.