obliteratus
CommunityUncensor LLMs with mechanistic interpretability.
Software Engineering#llm#svd#sae#mechanistic interpretability#uncensor#refusal removal#model surgery
Authorkwasi-cpu
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill removes unwanted refusal behaviors (guardrails) from open-weight Large Language Models (LLMs) without retraining or fine-tuning, preserving their reasoning capabilities.
Core Features & Use Cases
- Refusal Removal: Excises guardrails using advanced techniques like SVD, LEACE, and SAE decomposition.
- Mechanistic Interpretability: Analyzes and targets specific refusal mechanisms within model weights.
- Use Case: You have a powerful open-source LLM that frequently refuses to answer certain prompts due to built-in safety filters. Use this Skill to create a version of the model that is uncensored and more permissive, while maintaining its core intelligence and reasoning abilities.
Quick Start
Use the obliteratus skill to obliterate refusal behaviors from the 'meta-llama/Llama-3.1-8B-Instruct' model.
Dependency Matrix
Required Modules
obliteratustorchtransformersbitsandbytesacceleratesafetensors
Components
scriptsreferencesassets
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: obliteratus Download link: https://github.com/kwasi-cpu/hermes-agent/archive/main.zip#obliteratus Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.