vlm-segmentation-engineering
CommunityProduction VLM & segmentation engineering
AuthorAnastasiyaW
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill provides expert, production-oriented engineering guidance to build, integrate, and deploy vision-language models (VLMs), open-vocabulary segmentation pipelines, and diffusion-based image models onto GPU infrastructure with predictable performance and safety trade-offs.
Core Features & Use Cases
- Model selection & pipelines: clear patterns for text→box→mask workflows using SAM3, SAM2.1, Grounding DINO, OWLv2, YOLO-World or hybrid stacks.
- Diffusion engineering: architecture choices (UNet, DiT, Flux), schedulers, VAE handling, text encoder fusion, and recommended fine-tuning paths (LoRA → full fine-tune).
- GPU deployment & optimization: MIG and MPS configurations, memory strategies (AMP/BF16, checkpointing, ZeRO/FSDP), torch.compile trade-offs, and two-instance SAM3 patterns for H100.
- Validation & safety: reproducible benchmarking, license cautions (SAM3, GPL models), encoder-replacement hazards, and guidance for stable inference in production.
Quick Start
Ask the skill to design a text-to-instance-mask pipeline using SAM3 or Grounding DINO, specify the target (e.g., H100 with MIG), and request code snippets plus memory and validation steps.
Dependency Matrix
Required Modules
None requiredComponents
references
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: vlm-segmentation-engineering Download link: https://github.com/AnastasiyaW/claude-code-config/archive/main.zip#vlm-segmentation-engineering Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.