How do I log LLM prompts and outputs without leaking sensitive data?

Question

Accepted Answer

To log LLM prompts and outputs without leaking sensitive data, implement comprehensive logging with redaction, access controls, and tamper-evident mechanisms. This addresses the MAESTRO L5 (Evaluation and Observability) threat of PII leakage through logs. Redact sensitive information at ingestion: Before prompts or outputs reach the LLM or logs, sensitive data like API keys, PII, or credentials must be stripped or sanitized. This includes using patterns to catch and redact tokens, prefixes, and key/secret parameters. Implement tamper-evident audit logs: Ensure logs are stored in write-once storage, use signed entries, or append-only ledgers to prevent attackers from deleting or modifying audit trails. Shipping logs out-of-band to a Security Information and Event Management (SIEM) system with separate access controls further enhances security. Establish strict access controls and data residency: Treat vector databases as containing original text for access control purposes and encrypt embeddings at rest. Implement strict per-tenant memory scoping and separate physical or logical vector indexes for confidential data to prevent memory contamination. Ensure data residency labels are applied to all data and routing logic respects residency at the inference layer. Maintain a data inventory and classification service: Continuously map where personal and sensitive data exists, how it flows, who has access, and its retention period. A data classification service should be consulted by every data-producing and consuming component, with derived data inheriting classifications from inputs. Ensure comprehensive instrumentation and tracing: Log every action and decision the agent makes by default, producing telemetry at every chokepoint. Implement distributed tracing with a stable trace ID propagated through all hops to enable end-to-end reconstruction of agent activity for forensics. Define explicit retention and deletion rules: For sensitive information, implement explicit retention and deletion rules, especially for memory components. This includes workflows that propagate deletion to derived data and documentation of any data that cannot be deleted.

How do I log LLM prompts and outputs without leaking sensitive data?

How does your AI agent score?

Related questions