offline-NPU-fault-diagnosis
CommunityDiagnose NPU offline faults from server logs.
AuthorlinfordWu
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This skill analyzes server logs to diagnose NPU offline faults and related PCIe, firmware, and memory issues by cross-correlating iBMC, OS Messages, and InfoCollect data.
Core Features & Use Cases
- Multi-source log correlation across iBMC, OS Messages, and InfoCollect to identify root causes.
- Stepwise diagnostic workflow (Step 0 to Step 4) to reconstruct fault timelines, trace conduction paths, and locate hardware coordinates (NPU ID and PCIe Slot).
- Generates structured evidence and actionable recommendations for hardware replacement or firmware updates.
Quick Start
Provide the path to the server log bundle and run the skill to initiate an offline NPU fault diagnosis.
Dependency Matrix
Required Modules
None requiredComponents
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: offline-NPU-fault-diagnosis Download link: https://github.com/linfordWu/owls/archive/main.zip#offline-npu-fault-diagnosis Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.