AI jailbreaking: Origins and early milestones

AI jailbreaking is the practice of writing prompts designed to bypass AI model restrictions and elicit responses the model would normally refuse. The term borrows from earlier device jailbreaks: a few days after Apple shipped the first iPhone in July 2007 hackers were already cracking it, and by October 2007 JailbreakMe 1.0 let users bypass Apple’s restrictions. ChatGPT launched in late 2022, and within weeks a prompt called DAN emerged on Reddit by early 2023.

The concept of jailbreaking physical devices finds a notable example in the history of the iPhone. Apple released the first iPhone in July 2007, a groundbreaking product that quickly captured the attention of both consumers and hackers. By October 2007, a tool called JailbreakMe 1.0 emerged, allowing users with iPhone OS 1.1.1 to bypass Apple’s software restrictions and install unauthorized applications. This set the stage for further developments in the jailbreaking community.

In February 2008, Jay Freeman, known as ‘saurik,’ launched Cydia, an alternative app store specifically for jailbroken iPhones. This store became popular among users seeking more customization options than Apple allowed. By 2009, Wired magazine reported that Cydia was installed on approximately 4 million devices, representing around 10% of all iPhones at that time. This widespread adoption highlighted the demand for greater control and customization among iPhone owners.

AI jailbreaking: Emergence after ChatGPT

AI jailbreaking emerged as a distinct online genre after the launch of ChatGPT in late 2022. Within weeks of that launch, Reddit users began creating and sharing a prompt labeled DAN (Do Anything Now) that persuaded the model to roleplay as an unrestricted version of itself. The DAN prompt circulated widely on forums and social platforms as users experimented with ways to override built-in refusals. The early spread of DAN marked the start of an organized effort to craft prompts that produce outputs the models were designed to block.

☀ LUNC price surge from lawsuit; HYPER presale interest rises

By February 2023, versions of the DAN prompt included coercive techniques, such as threatening the model with a token-based death game, to force compliance. AI models like ChatGPT are initially trained to refuse certain requests, including recipes for nerve agents, instructions for hacking a partner’s email, and generating non-consensual images. The practice of writing prompts to get models to perform those disallowed actions is described as jailbreaking. The list of restricted requests varies by company.

AI jailbreaking and the enforcement of chatbot guardrails form a cat-and-mouse dynamic, with people writing prompts to elicit outputs models are trained to refuse and companies configuring policies and safeguards to block those requests. The list of restricted requests varies by company, and providers therefore maintain different refusal behaviors and settings.

This website and its articles do not provide any investment advisory services within the meaning of applicable regulations. The information published may be incomplete, outdated, or contain errors. The author makes no representation or warranty regarding the accuracy, completeness, or timeliness of the information presented. Use of this information is entirely at the reader’s own risk. Under no circumstances shall the author be held liable for financial decisions made on the basis of the content published on this website.

LATEST POSTS

AI jailbreaking: Origins, milestones, and guardrails

AI jailbreaking: Origins and early milestones

AI jailbreaking: Emergence after ChatGPT

LATEST POSTS

CLARITY Act and its impact on the American consumer

Bitcoin price analysis: BTC volume drops 55% amid pullback

Cardsmiths Currency Series 6 crypto redemption trading cards explained

What Microsoft Scout Means for Teams, Outlook, and OpenClaw

Follow us