indirect-prompt-injection

Community

Protect prompts from injected retrieved content.

Authormaruakshay
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Indirect prompt injection occurs when content fetched from external sources can influence model behavior, potentially compromising system prompts or leaking sensitive instructions. This guide provides guardrails to label, filter, and isolate retrieved content so it cannot override trusted prompts or execution paths.

Core Features & Use Cases

  • External-content labeling and trust-scoping for fetched blocks (source, trust level, allowed use).
  • Injection-pattern filtering to detect role-claims, instruction overrides, and delimiter breakouts before content enters prompts.
  • Isolation and auditing that route retrieved data through a trusted/information-only channel and log suspicious activity for post-incident review.
  • Use Case: Protect a chat assistant that ingests tickets, web pages, or emails from altering its system prompts or escalation logic.

Quick Start

Enable the external-content guardrails in your retrieval pipeline and run a test with a poisoned content sample.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: indirect-prompt-injection
Download link: https://github.com/maruakshay/mii-ai-security/archive/main.zip#indirect-prompt-injection

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.