omnimedia
CommunityMultimodal AI for analysis and image generation.
Authorvanducng
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Automates multimodal analysis and generation across audio, images, videos, and documents using Gemini and MiniMax, enabling streamlined insight extraction and content production.
Core Features & Use Cases
- Multimodal analysis: transcribe audio, OCR text, caption images, classify content, and extract structured data from diverse media formats.
- Multimodal generation: create images, videos, speech, and music via Gemini, Imagen, OpenRouter, Codex (subscription), and MiniMax within unified workflows.
- Document workflows: convert documents to Markdown, batch process files, and produce reusable artifacts for dashboards or knowledge bases.
Quick Start
Process a set of media files with Gemini to transcribe audio, describe images, and generate a short video.
Dependency Matrix
Required Modules
google-genaipython-dotenvPillowrequestspypdfpython-docxdocx2pdfmarkdown
Components
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: omnimedia Download link: https://github.com/vanducng/skills/archive/main.zip#omnimedia Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 510,000+ vetted skills library on demand.