Prompt Defence

There are few certainties when it comes to , and this is no different when it comes to mitigations. It is commonly the case in cyber security that having vulnerability X present can simply be fixed by implementing patch Y. The same cannot be said for vulnerabilities like prompt injection and jailbreaking; the underlying non-deterministic nature of has concerning implications for security. But that's not to say defensive measures cannot be taken. While you cannot guarantee prompt injection immunity, you can make it a lot less likely. This room goes through how.

Prerequisites

For this room you must know the fundamentals of , as covered in the / Security Threats room. It is also recommended that you complete the Prompt Injection and Jailbreaking rooms, as these establish a lot of context for this room.

Learning Objectives

Understand why security is fundamentally probabilistic, and why this means no single defence can fully prevent attacks.
Recognise how system prompt hardening raises the bar against prompt injection, and what its limits are.
Understand how input and output guardrails work, where they fail, and the trade-offs involved in deploying them.
Identify how deployment controls and least privilege reduce the damage when attacks succeed.
Understand why defence-in-depth is the only realistic approach to security.

Answer the questions below

Let's go!

Prompt Defence

Task 1Introduction

Prerequisites

Learning Objectives

Task 2Probabilistic SecurityPremium

Task 3System Prompt HardeningPremium

Task 4GuardrailsPremium

Task 5Securing DeploymentPremium

Task 6Bypassing GuardrailsPremium

Task 7ConclusionPremium

Ready to learn Cyber Security?