meow:multimodal

Name: meow:multimodal
Availability: InStock
Author: ngocsangyem

Community

Multimodal analysis and generation with Gemini.

Software Engineering #ocr #multimodal #transcription #gemini #image-analysis #video-generation #pdf-extraction

Authorngocsangyem

Version1.0.0

Installs0

System Documentation

What problem does it solve?

Gemini-based multimodal analysis and generation is integrated into MeowKit to handle images, videos, audio, and documents, enabling automated understanding, transcription, OCR, and media creation where Claude Code would otherwise struggle with binary content.

Core Features & Use Cases

Multimodal analysis: analyze, describe, OCR, and extract information from images, videos, audio, PDFs, and other documents.
Transcription & OCR: automatic transcription of audio/video and OCR for images and scanned documents.
Data extraction: extract structured content from documents (tables, text) and produce Markdown or structured outputs.
Media generation: generate images (Imagen) and videos (Veo) from prompts for design, mockups, or visual storytelling.
Auto-activation: activates automatically on file references or prompts like "analyze", "describe", "transcribe", "extract from" or "generate image/video".
Security & env: requires GEMINI_API_KEY and runs with a MeowKit security anchor system to bound outputs within policy.

Quick Start

Provide a media file path and a task to run multimodal analysis or generation.

Dependency Matrix

Required Modules

google-genaipython-dotenv

Components

scriptsreferences