This paper summarises evidence-based AI security risks and mitigations drawn from published research, regulatory frameworks, and operational practice.
Introduction
Generative AI systems are now routinely deployed in environments that affect customers, employees, finances, and regulatory obligations. In many organisations, large language models (LLMs) are integrated with retrieval systems, automation tools, and business workflows to provide decision support or operational assistance. While these systems offer material benefits, recent research and regulatory analysis show that they introduce distinct security, compliance, and human-impact risks that differ from traditional software systems (OWASP, Prompt Injection, 2024).
This paper focuses on documented, real-world failure modes and the controls that materially reduce risk, rather than speculative future concerns.
System Model and Assumptions
This analysis assumes a common enterprise AI architecture consisting of:
- An LLM accessed via API
- A retrieval layer that injects external documents into the prompt (RAG)
- Optional agent or function-calling mechanisms
- Logging and monitoring infrastructure
In many current implementations, system instructions, user inputs, and retrieved content are concatenated into a single prompt context. From a security perspective, this collapses multiple trust domains into one, creating opportunities for unintended influence over model behaviour (OWASP, Prompt Injection, 2024).
Threat Landscape
Nefarious Manipulation
Research and industry guidance have established that AI systems can be manipulated without compromising the underlying model or infrastructure. By introducing carefully crafted text into inputs or retrieved documents, an attacker can influence outputs, override intended constraints, or bias responses. This class of attack — commonly referred to as prompt injection — is now formally tracked as a top-tier AI security risk (OWASP, Prompt Injection, 2024).
Importantly, these attacks exploit design assumptions, not vulnerabilities in the model itself.
Retrieval-Related Risk
Retrieval-Augmented Generation systems introduce additional exposure. Academic research has demonstrated that poisoning or manipulating a retrieval corpus can cause persistent changes in system behaviour, leading to confidently generated but misleading outputs grounded in compromised sources (Zou et al., USENIX Security, 2024). Because retrieval failures are often silent, users may be unaware that the system’s outputs are based on unreliable evidence.
Compliance and Data Protection Risk
Prompts as Regulated Records
From a regulatory standpoint, AI prompts must be treated as data records. Prompts may include personal data, confidential business information, or sensitive contextual details. Where prompts are logged, retained, or reused, they fall squarely under data protection obligations such as GDPR principles of data minimisation, purpose limitation, and storage limitation (GDPR Article 5).
Regulatory guidance increasingly recognises that AI systems do not exempt organisations from existing data protection duties simply because the processing is automated (GDPR, Regulation (EU) 2016/679).
Governance and Accountability
Regulators have also highlighted the risk of over-reliance on probabilistic systems in high-impact contexts. Draft regulatory frameworks, including the EU AI Act, emphasise the need for human oversight, auditability, and clear accountability where AI systems influence decisions affecting rights, wellbeing, or access to services (European Commission, EU AI Act Proposal).
Human Impact and Morale
Beyond technical and legal considerations, AI systems can materially affect human behaviour. Studies and regulatory reviews note that people often attribute undue authority to systems that present information confidently and fluently. In sensitive contexts — such as health, employment, or wellbeing — this can lead to inappropriate reliance on outputs that were never designed to function as authoritative advice (ACM Computing Surveys, AI Security & Safety, 2024).
The risk is not malice, but misplaced trust.
Evidence-Based Mitigations
Trust-Domain Separation
Security guidance consistently recommends enforcing hard separation between trusted system instructions and untrusted content, including user inputs and retrieved documents. This architectural control prevents untrusted text from altering the model’s intended operating constraints and directly addresses the root cause of prompt injection attacks (OWASP, Prompt Injection Cheat Sheet, 2024).
Evidence Thresholds and Verification
To mitigate retrieval risks, systems should require minimum confidence thresholds and traceable sources before producing definitive outputs. Research on poisoned retrieval shows that without such controls, systems can propagate compromised information while appearing reliable (Zou et al., USENIX Security, 2024).
Prompt Minimisation and Retention Controls
Limiting what data can enter prompts, applying masking or redaction where necessary, and enforcing strict retention policies reduces compliance exposure. These measures directly support GDPR obligations and reduce the likelihood of unlawful data processing or excessive retention (GDPR Article 5).
Output Constraints and Human Escalation
In high-impact use cases, outputs should be constrained to predefined formats and escalated to human review when uncertainty is detected. This approach aligns with emerging regulatory expectations for human-in-the-loop governance in AI-assisted decision-making (European Commission, EU AI Act Proposal).
Threat, Control, and Outcome Summary
Conclusion
Evidence from security taxonomies, peer-reviewed research, and emerging regulation demonstrates that AI risk is primarily a function of system design and governance, not model intelligence. Nefarious manipulation, compliance failure, and human harm arise when probabilistic systems are granted authority without sufficient constraint. Regulator-ready AI systems therefore prioritise trust boundaries, evidence requirements, and human oversight, ensuring that confidence never exceeds justification.
References
- OWASP (2024). Generative AI Risk Taxonomy: Prompt Injection. https://genai.owasp.org/llmrisk/llm01-prompt-injection/
- OWASP (2024). LLM Prompt Injection Prevention Cheat Sheet. https://cheatsheetseries.owasp.org/cheatsheets/LLM_Prompt_Injection_Prevention_Cheat_Sheet.html
- Zou, J. et al. (2024). Poisoned Retrieval: Attacks on Retrieval-Augmented Generation. USENIX Security Symposium. https://www.usenix.org/system/files/usenixsecurity25-zou-poisonedrag.pdf
- GDPR (2016). Regulation (EU) 2016/679, Article 5 – Principles Relating to Processing of Personal Data. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32016R0679
- European Commission (2023). Proposal for a Regulation Laying Down Harmonised Rules on Artificial Intelligence (EU AI Act). https://commission.europa.eu/strategy-and-policy/priorities-2019-2024/europe-fit-digital-age/artificial-intelligence_en
This paper is intended to support informed discussion and does not constitute legal advice.
Authored by Richard Flores-Moore who is a senior finance and technology transformation leader with experience in governance and regulated systems. This paper is informed by practical work involving AI deployments with defined guardrails and oversight, including experience with GhostGen.AI.
Hastags: #AISecurity #AIRegulation #Governance #DataProtection #RiskManagement #GhostGen.AI