offline-NPU-fault-diagnosis

Community

Diagnose NPU offline faults from server logs.

AuthorlinfordWu
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This skill analyzes server logs to diagnose NPU offline faults and related PCIe, firmware, and memory issues by cross-correlating iBMC, OS Messages, and InfoCollect data.

Core Features & Use Cases

  • Multi-source log correlation across iBMC, OS Messages, and InfoCollect to identify root causes.
  • Stepwise diagnostic workflow (Step 0 to Step 4) to reconstruct fault timelines, trace conduction paths, and locate hardware coordinates (NPU ID and PCIe Slot).
  • Generates structured evidence and actionable recommendations for hardware replacement or firmware updates.

Quick Start

Provide the path to the server log bundle and run the skill to initiate an offline NPU fault diagnosis.

Dependency Matrix

Required Modules

None required

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: offline-NPU-fault-diagnosis
Download link: https://github.com/linfordWu/owls/archive/main.zip#offline-npu-fault-diagnosis

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.