When multiple clients share a RAG system, data isolation is absolute. A response that contains content from another client's knowledge base is a security failure, a contractual failure, and in most industries a regulatory failure simultaneously. Designing the isolation correctly, in a way that is structurally sound rather than dependent on correctly executing a filter on every query, is one of the most important architectural decisions in a RAG system serving multiple tenants.
Most RAG systems that serve multiple tenants are built on shared infrastructure. Separate infrastructure per tenant is operationally simple and maximally isolated, but prohibitively expensive to run at any meaningful number of tenants. The engineering challenge is achieving meaningful isolation on shared infrastructure, and understanding what 'meaningful' actually requires.
The answer depends on the threat model: what are the consequences of an isolation failure, and how likely are the failure modes that the chosen architecture is susceptible to?
The threat model for tenant isolation
Data isolation in RAG systems can fail in several ways. The most direct is retrieval contamination: a query from one tenant retrieves chunks from another tenant's index. This can happen through misconfigured access filters, bugs in the retrieval query construction, or edge cases in the vector store's filter implementation.
A subtler failure mode is inference contamination: the model generates content based on knowledge it acquired from one tenant's documents even when the retrieval was correctly isolated. This is a risk in tuned models where tenant training data can bleed into general model behaviour. It is not a risk in retrieval augmented systems where the model is not trained on tenant data, which is one of the arguments for RAG over fine tuning in tenant based contexts.
Prompt injection through document content is a third threat. If one tenant's knowledge base contains documents with malicious prompt injection content, and that content is retrieved into another tenant's context window, the injection may influence the response that the other tenant receives. This requires a cross tenant retrieval failure to occur first, making it a compound attack. Designing the system to prevent the first failure also prevents the second.
Metadata filtering: the simple approach and its limits
The simplest tenant isolation approach is a shared vector index with metadata filtering on each query. Each document is ingested with a tenant identifier as a metadata field. Every query includes a filter that restricts retrieval to documents belonging to the querying tenant.
This works correctly when the filter is always applied correctly. The isolation guarantee depends entirely on that condition. A query path that bypasses the filter, such as a debugging endpoint, an administrative function, or an edge case in the query construction code, violates isolation without any visible signal. The system continues to respond normally; the isolation failure may not be discovered until someone notices that a response contained content it should not have.
Metadata filtering also depends on the correctness of the vector store's filter implementation. Most production vector stores implement metadata filtering correctly, but the semantics vary: some apply filters before vector search (which is safer and slower), some apply them after (which is faster but means the model sees more unfiltered candidate documents than it should). Understanding exactly how your chosen vector store applies filters is a prerequisite for relying on them for security isolation.
Index per tenant isolation: the structurally safe approach
Index per tenant isolation removes the filter dependency entirely. Each tenant's documents are stored in a separate vector index. Retrieval for a tenant queries only that tenant's index. Cross tenant retrieval is structurally impossible rather than filtered at query time.
The isolation guarantee is stronger. It holds even if there is a bug in the query construction code, even if an administrative function is misconfigured, and even if an attacker finds a query path that bypasses application controls. There is no index to contaminate across tenants.
The operational cost is higher. Maintaining one index per tenant requires more storage, more indexing compute, and more careful index lifecycle management, including creating indices for new tenants and archiving them when tenants offboard. At large tenant counts this becomes a significant operational concern. At small and medium tenant counts, from dozens to low hundreds, it is manageable and the isolation guarantee is worth the overhead.
Namespace isolation as a middle ground
Some vector stores support namespaces: logical partitions within a single index that are queryable independently. Namespace isolation is stronger than metadata filtering because the namespace is a structural partition rather than a query time filter. It is weaker than index per tenant isolation because namespaces share the underlying index infrastructure and may share compute resources.
Whether namespace isolation provides a sufficient guarantee depends on the implementation specifics of the vector store and the regulatory requirements of the deployment. For regulated industries with explicit data segregation requirements, index per tenant isolation is generally the safer interpretation. For enterprise internal deployments where the threat model is primarily bugs rather than adversarial attack, namespace isolation is often sufficient.
The right approach is to define the isolation requirement explicitly from the contractual and regulatory obligations, not from what is easy to implement, and then choose the architecture that satisfies it. Choosing the architecture first and rationalising the isolation guarantee afterward is the pattern that produces systems with weaker guarantees than their operators believe.
Testing isolation, not just assuming it
Whatever isolation architecture is chosen, it should be tested rather than assumed. Testing tenant isolation means constructing scenarios that simulate cross tenant access attempts: queries from one tenant's context that include terms specific to another tenant's documents, attempts to access documents by direct identifier across tenant boundaries, and query patterns that target known edge cases in the filter or namespace implementation.
These tests should be part of the deployment pipeline, run against every significant change to the retrieval infrastructure. A change to the vector store version, the embedding model, the chunking configuration, or the query construction code can all affect isolation behaviour in ways that are not immediately apparent.
Isolation tests that never catch a problem are evidence that the system is working correctly, or evidence that the tests are not comprehensive enough. Periodically testing with the assistance of someone who did not write the isolation code is the most reliable way to distinguish between these two cases.
Isolation is a structural property, not a configuration setting
The RAG systems that maintain tenant data isolation reliably are the ones designed with isolation as a structural guarantee. Cross tenant access is architecturally prevented, rather than merely filtered at query time. The ones that fail tend to have relied on correct execution of a filter, which is a weaker guarantee than the consequences of failure warrant.
Making this decision correctly at the design stage is much cheaper than discovering the failure mode in production. The architecture can be changed later, but changing the isolation model requires ingesting all content again and reworking the retrieval infrastructure, which is expensive enough to make the upfront decision consequential.
More in this series
- RAG Architecture Decisions That Actually Matter in Production
- Document Ingestion Pipelines for RAG: Getting the Foundation Right
- Chunking Strategies for Retrieval Quality: What the Tutorials Don't Tell You
- Hybrid Search in RAG: When Vector Search Alone Isn't Enough
- Measuring RAG Quality: Retrieval Evaluation Beyond Vibes
- Multi Tenant RAG: Enforcing Data Isolation When Multiple Clients Share a System
- Keeping RAG Knowledge Bases Fresh: Ingestion, Versioning, and DriftComing soon
- Grounding and Citation in RAG Outputs: Making AI AccountableComing soon
- Private RAG Deployments: Running LLMs Without Sending Data to the CloudComing soon