AI jailbreaking: Origins and early milestones
AI jailbreaking is the practice of writing prompts designed to bypass AI model restrictions and elicit responses the model would normally refuse. The term borrows from earlier device jailbreaks: a few days after Apple shipped the first iPhone in July 2007 hackers were already cracking it, and by October 2007 JailbreakMe 1.0 let users bypass Apple’s restrictions. ChatGPT launched in late 2022, and within weeks a prompt called DAN emerged on Reddit by early 2023.
The concept of jailbreaking physical devices finds a notable example in the history of the iPhone. Apple released the first iPhone in July 2007, a groundbreaking product that quickly captured the attention of both consumers and hackers. By October 2007, a tool called JailbreakMe 1.0 emerged, allowing users with iPhone OS 1.1.1 to bypass Apple’s software restrictions and install unauthorized applications. This set the stage for further developments in the jailbreaking community.
In February 2008, Jay Freeman, known as ‘saurik,’ launched Cydia, an alternative app store specifically for jailbroken iPhones. This store became popular among users seeking more customization options than Apple allowed. By 2009, Wired magazine reported that Cydia was installed on approximately 4 million devices, representing around 10% of all iPhones at that time. This widespread adoption highlighted the demand for greater control and customization among iPhone owners.
AI jailbreaking: Emergence after ChatGPT
AI jailbreaking emerged as a distinct online genre after the launch of ChatGPT in late 2022. Within weeks of that launch, Reddit users began creating and sharing a prompt labeled DAN (Do Anything Now) that persuaded the model to roleplay as an unrestricted version of itself. The DAN prompt circulated widely on forums and social platforms as users experimented with ways to override built-in refusals. The early spread of DAN marked the start of an organized effort to craft prompts that produce outputs the models were designed to block.
By February 2023, versions of the DAN prompt included coercive techniques, such as threatening the model with a token-based death game, to force compliance. AI models like ChatGPT are initially trained to refuse certain requests, including recipes for nerve agents, instructions for hacking a partner’s email, and generating non-consensual images. The practice of writing prompts to get models to perform those disallowed actions is described as jailbreaking. The list of restricted requests varies by company.
AI jailbreaking and the enforcement of chatbot guardrails form a cat-and-mouse dynamic, with people writing prompts to elicit outputs models are trained to refuse and companies configuring policies and safeguards to block those requests. The list of restricted requests varies by company, and providers therefore maintain different refusal behaviors and settings.


