topic
Model Safety
1 post tagged Model Safety.
AI Security4 min read
Prompt Injection Is Role Confusion: New Research Reframes LLM Security
MIT researchers show frontier LLMs can't truly distinguish their own privileged reasoning from attacker-injected text — and writing style alone swings attack success from 61% to 10%.
- prompt injection
- llm security
- agentic ai
- jailbreak
- model safety