vlm-segmentation-engineering

Community

Production VLM & segmentation engineering

AuthorAnastasiyaW
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill provides expert, production-oriented engineering guidance to build, integrate, and deploy vision-language models (VLMs), open-vocabulary segmentation pipelines, and diffusion-based image models onto GPU infrastructure with predictable performance and safety trade-offs.

Core Features & Use Cases

  • Model selection & pipelines: clear patterns for text→box→mask workflows using SAM3, SAM2.1, Grounding DINO, OWLv2, YOLO-World or hybrid stacks.
  • Diffusion engineering: architecture choices (UNet, DiT, Flux), schedulers, VAE handling, text encoder fusion, and recommended fine-tuning paths (LoRA → full fine-tune).
  • GPU deployment & optimization: MIG and MPS configurations, memory strategies (AMP/BF16, checkpointing, ZeRO/FSDP), torch.compile trade-offs, and two-instance SAM3 patterns for H100.
  • Validation & safety: reproducible benchmarking, license cautions (SAM3, GPL models), encoder-replacement hazards, and guidance for stable inference in production.

Quick Start

Ask the skill to design a text-to-instance-mask pipeline using SAM3 or Grounding DINO, specify the target (e.g., H100 with MIG), and request code snippets plus memory and validation steps.

Dependency Matrix

Required Modules

None required

Components

references

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: vlm-segmentation-engineering
Download link: https://github.com/AnastasiyaW/claude-code-config/archive/main.zip#vlm-segmentation-engineering

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.