"A Theory of Prompt Injection (and why you should study roles)" by Charles Ye, softboiledheart cover art

"A Theory of Prompt Injection (and why you should study roles)" by Charles Ye, softboiledheart

"A Theory of Prompt Injection (and why you should study roles)" by Charles Ye, softboiledheart

Listen for free

View show details
SummaryWe've been building a theory of how prompt injections work under the hood.We show it comes down to how LLMs perceive roles (the humble chat template tags).We use this theory to create new attacks, explain some weird mech interp results, and predict when attacks work.We also advocate for a new subfield focused on the science of roles, and sketch some unexplored new research problems.Work supported by CBAI and Cosmos. Another version of this post (with more inline colors) is here, and full ICML paper here. 1. The World to an LLM How does an LLM know the difference between its own thoughts and someone else's words? To see why this is hard, let's look at what the world actually looks like to a model. Here's a simple chat where we ask Claude to check the day of the week. I took a snapshot of it midway through its follow-up response: Left = what we see; right = what the LLM gets. On the left is what we see in the chat interface: a structured conversation with distinct turns. On the right is what the model actually receives as input: a single, continuous stream [...] ---Outline:(00:12) Summary[... 15 more sections]--- First published: June 22nd, 2026 Source: https://www.lesswrong.com/posts/d8xDGzCEYE639qqEv/a-theory-of-prompt-injection-and-why-you-should-study-roles --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
adbl_web_anon_alc_button_suppression_t1
No reviews yet
In the spirit of reconciliation, Audible acknowledges the Traditional Custodians of country throughout Australia and their connections to land, sea and community. We pay our respect to their elders past and present and extend that respect to all Aboriginal and Torres Strait Islander peoples today.