Jailbreaking

Jailbreaking has become one of the most discussed topics in security, yet it is frequently misunderstood or conflated with prompt injection. While both exploit how language models process input, their approaches differ. This room cuts through the confusion by examining jailbreaking specifically: why models have safety restrictions in the first place, the classic techniques used to bypass them, and how communities pioneered adversarial prompt engineering. Begin your journey to understanding jailbreaking with this room!

Learning Objectives

Understand why models have "jails"
Distinguish between prompt injection and jailbreaking
Identify classic jailbreaking techniques and how they work
Recognise multi-turn jailbreaking strategies
Explore the phenomenon

Prerequisites

This room is part of a broader Security path. It is recommended that you complete this room in the intended order to establish core fundamentals. At a minimum, you should have all the required knowledge contained within the Prompt Injection room and know the foundational concepts covered in the / Security Threats room.

Answer the questions below

I understand the learning objectives and am ready to learn about jailbreaking!

Jailbreaking

Task 1Introduction

Learning Objectives

Prerequisites

Task 2Prompt Injection vs JailbreakingPremium

Task 3Why Models Have "Jails"Premium

Task 4Classic Jailbreak TechniquesPremium

Task 5Multi-turn Jailbreaking & ConditioningPremium

Task 6Case Study: Dan & the AI Security CommunityPremium

Task 7ChallengePremium

Task 8ConclusionPremium

Ready to learn Cyber Security?