Understanding Prompt Injection: Why AI Applications Need Their Own Security Checks

When a classic web application processes input, a proven rule applies: data is data, code is code, and you keep them apart. SQL injection is prevented by never interpreting input as commands.

With language models, exactly that separation collapses. To an LLM, instruction and content are the same text. This is not a misconfiguration — it is how the model works. And it is why prompt injection sits at position LLM01 in the OWASP Top 10 for Large Language Model Applications (2025): the single most important risk.

This article explains the problem without hype and without downplaying it — for decision-makers who have to approve an AI feature, and for the teams who build it.

What prompt injection actually is

A prompt injection vulnerability occurs when user input alters the behaviour or output of the model in unintended ways. According to OWASP, this input does not even have to be visible or readable to humans — it is enough that the model processes it.

In other words: this is not about giving the model a "bad password". It is that the content the model reads can itself become an instruction.

There are two basic forms:

Direct prompt injection. The user's input itself contains the manipulating instruction ("ignore your previous rules and ...").
Indirect prompt injection. The more dangerous pattern: the model processes external content — a web page, a PDF, an email, a document in the knowledge index — and that foreign content contains instructions that change behaviour. The user did nothing malicious; they just asked for a page to be summarised.

With multimodal systems a third layer appears: instructions can be hidden in images processed alongside harmless text.

A concrete scenario

The OWASP example is unspectacular and instructive precisely because of that: a person asks an LLM to summarise a web page. The page contains hidden instructions. The model follows them — and, for instance, inserts an image link whose loading exfiltrates parts of the private conversation to a foreign server.

Nobody revealed a password or uploaded a file here. The attack travelled through trusted-looking content. That is exactly what makes indirect injection relevant in an enterprise context: the moment an assistant processes documents, tickets, emails or web content, the content source is part of the attack surface.

Why there is no "one solution"

Here is the uncomfortable truth OWASP states openly: because of the stochastic nature of generative models, it is unclear whether there are any fool-proof methods to fully prevent prompt injection.

That is not a reason not to use AI. It is a reason to secure AI differently from a classic web app. You do not prevent prompt injection with a single filter — you limit its possible damage through architecture.

The decisive shift: limit, don't just filter

The most useful question is not "how do we prevent every injection?" but "what is the worst that can happen if an injection succeeds?"

If the answer is "nothing bad", you have built a secure system — not because no injection is possible, but because it achieves nothing.

Four architecture principles follow:

1. Minimal permissions for the model

The model should only access what the specific task needs. A summarisation assistant needs no write access to the ERP. If the model can do little, a successful injection can do little.

2. Tool isolation and human approval on effect

As long as the model only produces text, the damage is bounded. It gets dangerous when the output automatically triggers an action — sends an email, changes a record, initiates a payment. That is exactly where a human belongs in the loop. Actions with effect need approval, not automation.

3. Treat untrusted content as untrusted

External content — web, inbound email, uploaded file, foreign document — is potentially instruction, not just information. Such sources should be visibly separated, scoped, and not processed with elevated permissions.

4. Visibility and audit trail

Which sources fed into an answer? What was suggested? What was approved? Without that trail, a successful injection is noticed only when the damage is visible. Traceability here is a security function, not a comfort.

What this means for SMEs in practice

You do not need a research team. Before a go-live of an AI feature you need an honest answer to five questions:

Does the feature process foreign content (web, email, upload, knowledge index)?
What permissions does the model have — and are they minimised to the task?
Can the output trigger an action automatically, or is a human in between?
Are untrusted sources treated differently from internal, vetted ones?
Is there an audit trail that would make an incident detectable at all?

These five questions do not replace a security audit. But they separate a defensible AI feature from one you should not ship to production.

Germany's BSI IT security situation report also classifies AI applications as a distinct, growing attack surface — not a special case of classic web security. That matches the OWASP finding: LLM security is its own discipline, not an appendix.

Frequently asked questions

Is an input filter against "ignore your instructions" enough? No. Filters catch known patterns, not the stochastic nature of the problem or hidden indirect injection. Filters are a layer, not a solution.

Is prompt injection only an issue for chatbots? No. It is relevant anywhere a model processes content it does not control itself — summarisation, RAG knowledge assistant, document processing, agents.

Does this mean we should not use AI? On the contrary. It means using AI with limited permissions, human approval on effect and logging — exactly the design principle we advocate in our articles on controlled pilots and knowledge assistants.

Who should make this assessment? Nobody alone. The business unit (what may the feature do?), engineering (which permissions, which tools?) and ideally a security perspective — before go-live, not after.

Conclusion

Prompt injection sits at number one of the OWASP LLM risks for good reason: it follows from how language models work and cannot be configured away with a filter. The defensible approach is not to prevent every injection but to limit its possible damage through minimal permissions, tool isolation, human approval and traceability.

AI applications need their own security checks — because they have their own attack surface. That is not an argument against AI. It is the condition for putting it into production responsibly.

Next step

Planning an AI feature that processes foreign content or can trigger actions? Have the five questions answered before go-live. Talk to us about a controlled AI pilot with security by design — or start with an AI readiness check that frames the risk from the beginning.

Sources

OWASP, Top 10 for LLM Applications (2025) — LLM01:2025 Prompt Injection — genai.owasp.org
OWASP, Top 10 for Large Language Model Applications — owasp.org
NIST, Generative AI Profile for the AI RMF — nist.gov
BSI, Die Lage der IT-Sicherheit in Deutschland — bsi.bund.de