What is a security checklist for hardening a vector database used by an AI agent?

Question

Accepted Answer

Hardening a vector database for an AI agent requires treating it as a primary data store with robust access controls and implementing specific measures to prevent data leakage and integrity issues. This addresses the OWASP LLM08 Vector and Embedding Weaknesses risk. Treat vector databases as primary data stores: Do not relax access controls, as vector stores can contain reconstruction-grade representations of sensitive data through embedding inversion and context summaries. This is a key mitigation for OWASP LLM08 Vector and Embedding Weaknesses. Implement strict access controls: Treat vector databases as containing the original text for access control purposes. This includes explicit access control on memory retrieval and access-controlled retrieval. Encrypt embeddings at rest: Encrypt embeddings where warranted, especially for highly sensitive data, to mitigate embedding inversion attacks. Partition memory stores: Use strict per-tenant memory scoping and separate physical or logical vector indexes for confidential data to prevent memory contamination across sessions or tenants. This also includes per-tenant/source partitioning. Enforce data classification inheritance: Any data derived from classified inputs, such as embeddings or summaries, must inherit at least the classification of its inputs to prevent PII leakage through derived data. This is part of the Data & Memory Governance function. Manage right-to-erasure: Maintain a per-user data inventory across all stores and implement deletion workflows that propagate to derived data. Document any data that cannot be deleted with explicit user notice and make architectural choices that minimize the proliferation of personal data copies. This falls under the NIST AI RMF Govern function (L6 Security & Compliance). Address data residency: Apply residency labels to all data and implement routing logic that respects residency at the inference layer to prevent data residency violations. This is also part of the Data & Memory Governance function. Sanitize ingested content: Sanitize content ingested into the vector database to prevent retrieval poisoning. Validate retrieval relevance: Implement mechanisms to validate the relevance of retrieved content.

What is a security checklist for hardening a vector database used by an AI agent?

How does your AI agent score?

Related questions