palmetto-apptainer-libcuda-fix

Community

Repair CUDA driver visibility in Palmetto jobs.

AuthorKwongFuk
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This skill fixes container-side CUDA driver visibility for Palmetto Slurm jobs when Apptainer or vLLM workers fail with a missing libcuda error, even though the host provides /lib64/libcuda.so.1. It enables a safe repair plus a smoke validation step before resubmission.

Core Features & Use Cases

  • Patch the sbatch launcher to create a scratch-local CUDA compatibility directory and symlink the host libcuda into the container path used by the worker.
  • Bind the compatibility path into the container and adjust LD_LIBRARY_PATH for the preflight and real runs.
  • Add a lightweight preflight and a short smoke test to verify that libcuda.so and libcuda.so.1 can be loaded by a Python process inside the container before resubmitting the real job.

Quick Start

Apply the patch to the sbatch launcher and run the included smoke test before resubmitting the original job.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: palmetto-apptainer-libcuda-fix
Download link: https://github.com/KwongFuk/codex-skills/archive/main.zip#palmetto-apptainer-libcuda-fix

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.