cluster-fast-slow-rank-detector

Community

Diagnose cluster fast/slow cards with rules

Authorkali20gakki
Version1.0.0
Installs0

System Documentation

What problem does it solve?

专门用于 Ascend 集群 Profiling 数据中的快慢卡诊断。它能自动接收集群路径,结合 Expert Rules(专家规则)对慢卡进行宏观诊断并下钻到微观根因,输出瓶颈类别及差异证据。

Core Features & Use Cases

  • 基于专家规则的宏观诊断,自动将慢卡分类为 Host 下发瓶颈、纯计算慢、或通信慢,并通过脚本对比输出差异。
  • 微观证据:调用 scripts/ 下的对比脚本 compare_api_stats.py 与 compare_op_stats.py,对慢卡与快卡的 API/算子耗时进行对比,定位瓶颈点。
  • 场景适用:在多 Rank 的集群 profiling 场景,给出慢卡的 RankID、快卡基准、以及差异前 20 条差异。

Quick Start

在集群数据根目录下,指定慢卡 Rank 与快卡 Rank,运行本 Skill 的对比脚本以生成诊断报告。

Dependency Matrix

Required Modules

pandas

Components

scripts

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: cluster-fast-slow-rank-detector
Download link: https://github.com/kali20gakki/mindstudio-skills/archive/main.zip#cluster-fast-slow-rank-detector

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.