warmreset

Community

Quick GPU fault recovery without server restart.

Authordongg622
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill enables operators to reset faulty GPU cards on the fly, avoiding time-consuming system reboots.

Core Features & Use Cases

  • Fault Recovery: Reset GPU hardware to clear errors and restore stability during maintenance or failure.
  • Compatibility: Supports physical servers, virtual machines, and multi-machine clusters with MetaXLink.
  • Use Case: Reduce downtime in large AI training clusters by performing hot resets of problematic GPUs without interrupting other workloads.

Quick Start

Issue a command to reset GPU 0 without rebooting the server.

Dependency Matrix

Required Modules

None required

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: warmreset
Download link: https://github.com/dongg622/china-ai-chip-skill/archive/main.zip#warmreset

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.