trtllm-serve-config-guide

Community

Generate a best-fit TRT-LLM serve config

Authoryo-steven
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill removes the trial-and-error of finding a good starting configuration for trtllm-serve by grounding the output YAML in the repository’s checked-in TensorRT-LLM configs and deployment guidance.

Core Features & Use Cases

  • Source-backed config generation: Produces a starting trtllm-serve --config YAML by matching the user’s model + serving constraints to checked-in configs, avoiding speculative and out-of-scope modes by default.
  • Objective-preserving selection: Keeps the user’s latency/throughput intent (e.g., Min Latency, Balanced, Max Throughput) by selecting configs using database profile labels when available.
  • Guardrailed adjustments: Updates only scenario-dependent fields (like batch/token/seq limits and KV-cache settings) after reading the relevant model deployment docs, and flags any inferred/interpolated fields as unverified.

Quick Start

Ask for a non-speculative, single-node PyTorch aggregate in-flight-batching serve YAML for your model and GPU with your target concurrency and input/output lengths, specifying whether you want Min Latency, Balanced, or Max Throughput.

Dependency Matrix

Required Modules

None required

Components

references

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: trtllm-serve-config-guide
Download link: https://github.com/yo-steven/skills-exploration-20260522/archive/main.zip#trtllm-serve-config-guide

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.