trtllm-serve-config-guide
CommunityGenerate a best-fit TRT-LLM serve config
Software Engineering#YAML#model deployment#config generation#KV cache#trtllm-serve#latency tuning#throughput optimization
Authoryo-steven
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill removes the trial-and-error of finding a good starting configuration for trtllm-serve by grounding the output YAML in the repository’s checked-in TensorRT-LLM configs and deployment guidance.
Core Features & Use Cases
- Source-backed config generation: Produces a starting
trtllm-serve --configYAML by matching the user’s model + serving constraints to checked-in configs, avoiding speculative and out-of-scope modes by default. - Objective-preserving selection: Keeps the user’s latency/throughput intent (e.g., Min Latency, Balanced, Max Throughput) by selecting configs using database profile labels when available.
- Guardrailed adjustments: Updates only scenario-dependent fields (like batch/token/seq limits and KV-cache settings) after reading the relevant model deployment docs, and flags any inferred/interpolated fields as unverified.
Quick Start
Ask for a non-speculative, single-node PyTorch aggregate in-flight-batching serve YAML for your model and GPU with your target concurrency and input/output lengths, specifying whether you want Min Latency, Balanced, or Max Throughput.
Dependency Matrix
Required Modules
None requiredComponents
references
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: trtllm-serve-config-guide Download link: https://github.com/yo-steven/skills-exploration-20260522/archive/main.zip#trtllm-serve-config-guide Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.