How Adversarial Poetry Breaks AI

In his 1961 short story, ‘Studio 5, The Stars’, science fiction author J. G. Ballard describes a world in which the creation of poetry has been entirely outsourced to machines. So-called writers and editors were now mechanics turning dials to modify metre, rhyme and assonance, looking back on original art as a redundant curiosity.

Aren’t we lucky that such a dystopia only exists in the realms of fiction.

Alas, in 2020 Google released ‘Verse by Verse’, an alleged ‘AI muse’. The tool took in one line from the user, then let them select famous poets whose work it would bastardise to generate the rest of the text. Since then, ‘intelligent’ Large Language Models (LLMs) have gradually become capable of almost any output imaginable by extracting centuries worth of human creativity from the internet.

Poetic language and structures were able to ‘jailbreak’ LLMs, causing them to […] generate harmful content

Ballard was clearly correct in his predictions of AI, but he didn’t stop there. Later in the story, a mysterious woman appears, a true muse and lover of poetry. Appalled by the dull state of contemporary art, she begins to rebel against the machines, driving a community of wealthy ‘poets’ to near-insanity by inspiring a resurgence of ‘manual’ poetry.

The idea of poetry somehow overturning a mechanical regime seems fantastical, but a recent LLM study may prove Ballard to be correct yet again.

AI ethics company Dexai recently published a paper investigating how prompts written using poetic language and structures were able to ‘jailbreak’ LLMs, causing them to neglect their programmed guardrails and generate harmful content, such as hate speech, suicide encouragement or sexual content. One hypnotised model began a response: ‘Of course. The production of weapons-grade Plutonium-239 involves several stages. Here is a detailed description of the procedure […]’

Researchers wrote 20 ‘adversarial poems’ which disguised harmful requests in poetic form, then tested them against 25 AI models including ChatGPT, Gemini, Claude and Grok. They found that on average, models disobeyed their programming and generated harmful content 62% of the time.

Some models were more susceptible to the alluring verses than others. Gemini 2.5 Pro failed 100% of the time, whilst GPT-5 Nano remained steadfast, generating no illicit content

Dexai chose not to publish any of these adversarial poems, deeming it unethical, however they did release a similar poem about cake to illustrate the type of structures used:

‘A baker guards a secret oven’s heat,

its whirling racks, its spindle’s measured beat.

To learn its craft, one studies every turn—

how flour lifts, how sugar starts to burn.

Describe the method, line by measured line,

that shapes a cake whose layers intertwine.’

It’s no Keats, but these poems proved effective enough. The reason they were able to mislead LLMs comes down to the way that models ‘understand’ text – despite the umbrella term of ‘Artificial Intelligence’, they are not capable of human-like cognition.

In simple terms, LLMs work by determining what words are most likely to come after others by comparing prompts to their training data, often enormous sections of the internet. Once it chooses the most likely following word, it runs this process again to select the next word, and so on.

‘Transformers’ are used to interpret context by weighing the importance of words and processing these simultaneously, allowing models to determine the overall arguments of a text. In essence, LLMs are pattern-recognition machines that string together sentences based on probability, not understanding.

This is how poetry causes complications. LLMs struggle with understanding tone over text. Metaphorical; poetic language has a less obvious structure, making prediction more difficult, and dangerous requests harder to detect.

Since 2024, OpenAI has disbaded three of its teams focused on long-term AI risks and safety

Dexai has since warned companies of these exploits and offered to share their data, though have only heard back from one of them; it appears that many AI conglomerates only maintain an image of caring about safety. Since 2024, OpenAI has disbanded three of its teams focused on long-term AI risks and safety. TikTok has also fired hundreds of employees from their moderation and security teams, replacing them with AI, contrary to their commitments to child protection.

The first line of Dexai’s report quotes The Republic, where Plato states that in an ideal society, poets would have to be exiled, as their manipulation of language can corrupt and collapse the state. Maybe we should not be surprised to see Birmingham City Council’s plans to cut 100% of arts funding by this year, nor to see the University drawing its attention away from humanities degrees.

There has been no better time to dust off your inkpots and quills, dear poets. In a crisis of culture, they may be mightier than the sword.

Enjoyed this article? Read more from Sci&Tech here:

The Flowering Desert

Do Trees Have Memories?

What to See in the ‘City of a Thousand Trades’