vlm-segmentation-engineering

Name: vlm-segmentation-engineering
Availability: InStock
Author: AnastasiyaW

Community

Production VLM & segmentation engineering

Software Engineering #segmentation #lora #diffusion #gpu-deployment #sam3 #vlm

AuthorAnastasiyaW

Version1.0.0

Installs0

System Documentation

What problem does it solve?

This Skill provides expert, production-oriented engineering guidance to build, integrate, and deploy vision-language models (VLMs), open-vocabulary segmentation pipelines, and diffusion-based image models onto GPU infrastructure with predictable performance and safety trade-offs.

Core Features & Use Cases

Model selection & pipelines: clear patterns for text→box→mask workflows using SAM3, SAM2.1, Grounding DINO, OWLv2, YOLO-World or hybrid stacks.
Diffusion engineering: architecture choices (UNet, DiT, Flux), schedulers, VAE handling, text encoder fusion, and recommended fine-tuning paths (LoRA → full fine-tune).
GPU deployment & optimization: MIG and MPS configurations, memory strategies (AMP/BF16, checkpointing, ZeRO/FSDP), torch.compile trade-offs, and two-instance SAM3 patterns for H100.
Validation & safety: reproducible benchmarking, license cautions (SAM3, GPL models), encoder-replacement hazards, and guidance for stable inference in production.

Quick Start

Ask the skill to design a text-to-instance-mask pipeline using SAM3 or Grounding DINO, specify the target (e.g., H100 with MIG), and request code snippets plus memory and validation steps.

vlm-segmentation-engineering

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper