omnimedia

Community

Multimodal AI for analysis and image generation.

Authorvanducng
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Automates multimodal analysis and generation across audio, images, videos, and documents using Gemini and MiniMax, enabling streamlined insight extraction and content production.

Core Features & Use Cases

  • Multimodal analysis: transcribe audio, OCR text, caption images, classify content, and extract structured data from diverse media formats.
  • Multimodal generation: create images, videos, speech, and music via Gemini, Imagen, OpenRouter, Codex (subscription), and MiniMax within unified workflows.
  • Document workflows: convert documents to Markdown, batch process files, and produce reusable artifacts for dashboards or knowledge bases.

Quick Start

Process a set of media files with Gemini to transcribe audio, describe images, and generate a short video.

Dependency Matrix

Required Modules

google-genaipython-dotenvPillowrequestspypdfpython-docxdocx2pdfmarkdown

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: omnimedia
Download link: https://github.com/vanducng/skills/archive/main.zip#omnimedia

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 510,000+ vetted skills library on demand.