Multi-tenant RAG with Amazon Bedrock Information Bases

Organizations are repeatedly looking for methods to make use of their proprietary data and area experience to realize a aggressive edge. With the arrival of basis fashions (FMs) and their outstanding pure language processing capabilities, a brand new alternative has emerged to unlock the worth of their knowledge property.

As organizations attempt to ship personalised experiences to prospects utilizing generative AI, it turns into paramount to specialize the habits of FMs utilizing their very own—and their prospects’—knowledge. Retrieval Augmented Generation (RAG) has emerged as a easy but efficient method to realize a desired stage of specialization.

Amazon Bedrock Knowledge Bases is a completely managed functionality that simplifies the administration of the complete RAG workflow, empowering organizations to offer FMs and brokers contextual info from firm’s personal knowledge sources to ship extra related and correct responses tailor-made to their particular wants.

For organizations creating multi-tenant merchandise, reminiscent of impartial software program distributors (ISVs) creating software program as a service (SaaS) choices, the power to personalize experiences for every of their prospects (tenants of their SaaS software) is especially important. This personalization may be achieved by implementing a RAG method that selectively makes use of tenant-specific knowledge.

On this submit, we focus on and supply examples of methods to obtain personalization utilizing Amazon Bedrock Information Bases. We focus significantly on addressing the multi-tenancy challenges that ISVs face, together with knowledge isolation, safety, tenant administration, and price administration. We concentrate on eventualities the place the RAG structure is built-in into the ISV software and never instantly uncovered to tenants. Though the particular implementations offered on this submit use Amazon OpenSearch Service as a vector database to retailer tenants’ knowledge, the challenges and structure options proposed may be prolonged and tailor-made to different vector retailer implementations.

Multi-Tenancy design concerns

When architecting a multi-tenanted RAG system, organizations must take a number of concerns under consideration:

Tenant isolation – One essential consideration in designing multi-tenanted programs is the extent of isolation between the info and sources associated to every tenant. These sources embrace knowledge sources, ingestion pipelines, vector databases, and RAG consumer software. The extent of isolation is usually ruled by safety, efficiency, and the scalability necessities of your answer, collectively together with your regulatory necessities. For instance, it’s possible you’ll must encrypt the info associated to every of your tenants utilizing a distinct encryption key. You may additionally must make it possible for excessive exercise generated by one of many tenants doesn’t have an effect on different tenants.
Tenant variability – The same but distinct consideration is the extent of variability of the options supplied to every tenant. Within the context of RAG programs, tenants may need various necessities for knowledge ingestion frequency, doc chunking technique, or vector search configuration.
Tenant administration simplicity – Multi-tenant options want a mechanism for onboarding and offboarding tenants. This dimension determines the diploma of complexity for this course of, which could contain provisioning or tearing down tenant-specific infrastructure, reminiscent of knowledge sources, ingestion pipelines, vector databases, and RAG consumer purposes. This course of might additionally contain including or deleting tenant-specific knowledge in its knowledge sources.
Value-efficiency – The working prices of a multi-tenant answer depend upon the best way it gives the isolation mechanism for tenants, so designing a cost-efficient structure for the answer is essential.

These 4 concerns must be rigorously balanced and weighted to swimsuit the wants of the particular answer. On this submit, we current a mannequin to simplify the decision-making course of. Utilizing the core isolation ideas of silo, pool, and bridge outlined within the SaaS Tenant Isolation Strategies whitepaper, we suggest three patterns for implementing a multi-tenant RAG answer utilizing Amazon Bedrock Information Bases, Amazon Simple Storage Service (Amazon S3), and OpenSearch Service.

A typical RAG answer utilizing Amazon Bedrock Information Bases consists of a number of elements, as proven within the following determine:

A Typical RAG Solution Architecture

The primary problem in adapting this structure for multi-tenancy is figuring out methods to present isolation between tenants for every of the elements. We suggest three prescriptive patterns that cater to totally different use circumstances and supply carrying ranges of isolation, variability, administration simplicity, and cost-efficiency. The next determine illustrates the trade-offs between these three architectural patterns by way of attaining tenant isolation, variability, cost-efficiency, and ease of tenant administration.

Trade offs of the three RAG architectural patterns

Multi-tenancy patterns

On this part, we describe the implementation of those three totally different multi-tenancy patterns in a RAG structure based mostly on Amazon Bedrock Information Bases, discussing their use circumstances in addition to their execs and cons.

Silo

The silo sample, illustrated within the following determine, provides the best stage of tenant isolation, as a result of the complete stack is deployed and managed independently for every single tenant.

Solution architecture for the Silo pattern

Within the context of the RAG structure applied by Amazon Bedrock Information Bases, this sample prescribes the next:

A separate knowledge supply per tenant – On this submit, we think about the situation wherein tenant paperwork to be vectorized are saved in Amazon S3, subsequently a separate S3 bucket is provisioned per tenant. This enables for per-tenant AWS Key Management Service (AWS KMS) encryption keys, in addition to per-tenant S3 lifecycle insurance policies to handle object expiration, and object versioning insurance policies to keep up a number of variations of objects. Having separate buckets per tenant gives isolation and permits for personalized configurations based mostly on tenant necessities.
A separate data base per tenant – This enables for a separate chunking technique per tenant, and it’s significantly helpful for those who envision the doc foundation of your tenants to be totally different in nature. For instance, one in every of your tenants may need a doc base composed of flat textual content paperwork, which may be handled with fixed-size chunking, whereas one other tenant may need a doc base with express sections, for which semantic chunking could be higher suited to part. Having a distinct data base per tenant additionally permits you to determine on totally different embedding fashions, providing you with the likelihood to decide on totally different vector dimensions, balancing accuracy, price, and latency. You may select a distinct KMS key per tenant for the transient knowledge shops, which Amazon Bedrock makes use of for end-to-end per-tenant encryption. You may also select per-tenant knowledge deletion insurance policies to regulate whether or not your vectors are deleted from the vector database when a data base is deleted. Separate data bases additionally imply which you could have totally different ingestion schedules per tenants, permitting you to conform to totally different knowledge freshness requirements together with your prospects.
A separate OpenSearch Serverless assortment per tenant – Having a separate OpenSearch Serverless assortment per tenant permits you to have a separate KMS encryption key per tenant, sustaining per-tenant end-to-end encryption. For every tenant-specific assortment, you’ll be able to create a separate vector index, subsequently selecting for every tenant the gap metric between Euclidean and dot product, to be able to select how a lot significance to offer to the doc size. You may also select the particular settings for the HNSW algorithm per tenant to regulate reminiscence consumption, price, and indexing time. Every vector index, together with the setup of metadata mappings in your data base, can have a distinct metadata set per tenant, which can be utilized to carry out filtered searches. Metadata filtering can be utilized within the silo sample to limit the search to a subset of paperwork with a selected attribute. For instance, one in every of your tenants is likely to be importing dated paperwork and needs to filter paperwork pertaining to a selected yr, whereas one other tenant is likely to be importing paperwork coming from totally different firm divisions and needs to filter over the documentation of a selected firm division.

As a result of the silo sample provides tenant architectural independence, onboarding and offboarding a tenant means creating and destroying the RAG stack for that tenant, composed of the S3 bucket, data base, and OpenSearch Serverless assortment. You’ll sometimes do that utilizing infrastructure as code (IaC). Relying in your software structure, you might also must replace the log sinks and monitoring programs for every tenant.

Though the silo sample provides the best stage of tenant isolation, it’s also the most costly to implement, primarily on account of making a separate OpenSearch Serverless assortment per tenant for the next causes:

Minimal capability costs – Every OpenSearch Serverless assortment encrypted with a separate KMS key has a minimal of two OpenSearch Compute Models (OCUs) charged hourly. These OCUs are charged independently from utilization, that means that you’ll incur costs for dormant tenants for those who select to have a separate KMS encryption key per tenant.
Scalability overhead – Every assortment individually scales OCUs relying on utilization, in steps of 6 GB of reminiscence, and related vCPUs and quick entry storage. Because of this sources won’t be totally and optimally utilized throughout tenants.

When selecting the silo sample, word {that a} most of 100 data bases are supported in every AWS account. This makes the silo sample favorable in your largest tenants with particular isolation necessities. Having a separate data base per tenant additionally reduces the affect of quotas on concurrent ingestion jobs (most one concurrent job per KB, 5 per account), job dimension (100 GB per job), and knowledge sources (most of 5 million paperwork per knowledge supply). It additionally improves the efficiency equity as perceived by your tenants.
Deleting a data base throughout offboarding a tenant is likely to be time-consuming, relying on the scale of the info sources and the synchronization course of. To mitigate this, you’ll be able to set the info deletion coverage in your tenants’ data bases to RETAIN. This manner, the data base deletion course of is not going to delete your tenants’ knowledge from the OpenSearch Service index. You may delete the index by deleting the OpenSearch Serverless assortment.

Pool

In distinction with the silo sample, within the pool sample, illustrated within the following determine, the entire end-to-end RAG structure is shared by your tenants, making it significantly appropriate to accommodate many small tenants.

Solution architecture for the pool pattern

The pool sample prescribes the next:

Single knowledge supply – The tenants’ knowledge is saved throughout the identical S3 bucket. This suggests that the pool mannequin helps a shared KMS key for encryption at relaxation, not providing the opportunity of per-tenant encryption keys. To determine tenant possession downstream for every doc uploaded to Amazon S3, a corresponding JSON metadata file must be generated and uploaded. The metadata file technology course of may be asynchronous, and even batched for a number of recordsdata, as a result of Amazon Bedrock Information Bases requires an express triggering of the ingestion job. The metadata file should use the identical title as its related supply doc file, with .metadata.json appended to the top of the file title, and should be saved in the identical folder or location because the supply file within the S3 bucket. The next code is an instance of the format:

{
  "metadataAttributes" : {
    "tenantId" : "tenant_1",
  ...
  }
}

Within the previous JSON construction, the important thing tenantId has been intentionally chosen, and may be modified to a key you wish to use to precise tenancy. The tenancy discipline might be used at runtime to filter paperwork belonging to a selected tenant, subsequently the filtering key at runtime should match the metadata key within the JSON used to index the paperwork. Moreover, you’ll be able to embrace different metadata keys to carry out additional filtering that isn’t based mostly on tenancy. When you don’t add the object.metadata.json file, the consumer software received’t be capable of discover the doc utilizing metadata filtering.

Single data base – A single data base is created to deal with the info ingestion in your tenants. Because of this your tenants will share the identical chunking technique and embedding mannequin, and share the identical encryption at-rest KMS key. Furthermore, as a result of ingestion jobs are triggered per knowledge supply per KB, you may be restricted to supply to your tenants the identical knowledge freshness requirements.
Single OpenSearch Serverless assortment and index – Your tenant knowledge is pooled in a single OpenSearch Service vector index, subsequently your tenants share the identical KMS encryption key for vector knowledge, and the identical HNSW parameters for indexing and question. As a result of tenant knowledge isn’t bodily segregated, it’s essential that the question consumer be capable of filter outcomes for a single tenant. This may be effectively achieved utilizing both the Amazon Bedrock Information Bases Retrieve or RetrieveAndGenerate, expressing the tenant filtering situation as a part of the retrievalConfiguration (for extra particulars, see Amazon Bedrock Knowledge Bases now supports metadata filtering to improve retrieval accuracy). If you wish to limit the vector search to return outcomes for tenant_1, the next is an instance consumer implementation performing RetrieveAndGenerate based mostly on the AWS SDK for Python (Boto3):


import boto3

bedrock_agent_runtime = boto3.consumer(
    service_name = "bedrock-agent-runtime"
)

tenant_filter = {
    "equals": {
        "key": "tenantId",
        "worth": "tenant_1"
    }
}

retrievalConfiguration = {
    "vectorSearchConfiguration": {
        "filter": tenant_filter
    }
}

bedrock_agent_runtime.retrieve_and_generate(
    enter = {
        'textual content': 'The unique person question'
    },
    retrieveAndGenerateConfiguration = {
        'kind': 'KNOWLEDGE_BASE',
        'knowledgeBaseConfiguration': {
            'knowledgeBaseId': <YOUR_KNOWLEDGEBASE_ID>,
            'modelArn': <FM_ARN>,
            'retrievalConfiguration': retrievalConfiguration
        }
    }
)

textual content comprises the unique person question that must be answered. Bearing in mind the doc base, <YOUR_KNOWLEDGEBASE_ID> must be substituted with the identifier of the data base used to pool your tenants, and <FM_ARN> must be substituted with the Amazon Bedrock mannequin Amazon Useful resource Identify (ARN) you wish to use to answer to the person question. The consumer offered within the previous code has been streamlined to current the tenant filtering performance. In a manufacturing case, we advocate implementing session and error dealing with, logging and retry logic, and separating the tenant filtering logic from the consumer invocation to make it inaccessible to builders.

As a result of the end-to-end structure is pooled on this sample, onboarding and offboarding a tenant doesn’t require you to create new bodily or logical constructs, and it’s so simple as beginning or stopping and importing particular tenant paperwork to Amazon S3. This suggests that there isn’t a AWS managed API that can be utilized to offboard and end-to-end overlook a selected tenant. To delete the historic paperwork belonging to a selected tenant, you’ll be able to simply delete the related objects in Amazon S3. Usually, prospects can have an exterior software that maintains the record of accessible tenants and their standing, facilitating the onboarding and offboarding course of.

Sharing the monitoring system and logging capabilities on this sample reduces the complexity of operations with numerous tenants. Nevertheless, it requires you to gather the tenant-specific metrics from the consumer facet to carry out particular tenant attribution.

The pool sample optimizes the end-to-end price of your RAG structure, as a result of sharing OCUs throughout tenants maximizes using every OCU and minimizes the tenants’ idle time. Sharing the identical pool of OCUs throughout tenants signifies that this sample doesn’t supply efficiency isolation on the vector retailer stage, so the most important and most lively tenants would possibly affect the expertise of different tenants.

When selecting the pool sample in your RAG structure, you ought to be conscious {that a} single ingestion job can ingest or delete a most of 100 GB. Moreover, the info supply can have a most of 5 million paperwork. If the answer has many tenants which can be geographically distributed, think about triggering the ingestion job a number of instances a day so that you don’t hit the ingestion job dimension restrict. Additionally, relying on the quantity and dimension of your paperwork to be synchronized, the time for ingestion might be decided by the embedding mannequin invocation price. For instance, think about the next situation:

Variety of tenants to be synchronized = 10
Common variety of paperwork per tenant = 100
Common dimension per doc = 2 MB, containing roughly 200,000 tokens divided in 220 chunks of 1,000 tokens to permit for overlap
Utilizing Amazon Titan Embeddings v2 on demand, permitting for two,000 RPM and 300,000 TPM

This could end result within the following:

Complete embeddings requests = 10*100*220 = 220,000
Complete tokens to course of = 10*100*1,000=1,000,000
Complete time taken to embed is dominated by the RPM, subsequently 220,000/2,000 = 1 hour, 50 minutes

This implies you may set off an ingestion job 12 instances per day to have a great time distribution of knowledge to be ingested. This calculation is a best-case situation and doesn’t account for the latency launched by the FM when creating the vector from the chunk. When you count on having to synchronize numerous tenants on the identical time, think about using provisioned throughput to lower the time it takes to create vector embeddings. This method will even assist distribute the load on the embedding fashions, limiting throttling of the Amazon Bedrock runtime API calls.

Bridge

The bridge sample, illustrated within the following determine, strikes a steadiness between the silo and pool patterns, providing a center floor that balances tenant knowledge isolation and safety.

Solution architecture for the bridge pattern

The bridge sample delivers the next traits:

Separate knowledge supply per tenant in a typical S3 bucket – Tenant knowledge is saved in the identical S3 bucket, however prefixed by a tenant identifier. Though having a distinct prefix per tenant doesn’t supply the opportunity of utilizing per-tenant encryption keys, it does create a logical separation that can be utilized to segregate knowledge downstream within the data bases.

Separate data base per tenant – This sample prescribes making a separate data base per tenant just like the silo sample. Subsequently, the concerns within the silo sample apply. Functions constructed utilizing the bridge sample often share question purchasers throughout tenants, so they should determine the particular tenant’s data base to question. They will determine the data base by storing the tenant-to-knowledge base mapping in an exterior database, which manages tenant-specific configurations. The next instance exhibits methods to retailer this tenant-specific info in an Amazon DynamoDB desk:

import boto3
# Create a DynamoDB useful resource
dynamodb = boto3.useful resource('dynamodb')

table_name="tenantKbConfig"
attribute_definitions = [
    {'AttributeName': 'tenantId', 'AttributeType': 'S'}
]

key_schema = [
    {'AttributeName': 'tenantId', 'KeyType': 'HASH'}
]

#Create the desk holding KB tenant configurations
tenant_kb_config_table = dynamodb.create_table(
    TableName=table_name,
    AttributeDefinitions=attribute_definitions,
    KeySchema=key_schema,
    BillingMode="PAY_PER_REQUEST" # Use on-demand billing mode for illustration
)

#Create a tenant
    tenant_kb_config_table.put_item(
    Merchandise={
        'tenantId': 'tenant_1',
        'knowledgebaseId': <YOUR_KNOWLEDGEBASE_ID>,
        'modelArn': <FM_ARN>     }
)

In a manufacturing setting, your software will retailer tenant-specific parameters belonging to different performance in your knowledge shops. Relying in your software structure, you would possibly select to retailer knowledgebaseId and modelARN alongside the opposite tenant-specific parameters, or create a separate knowledge retailer (for instance, the tenantKbConfig desk) particularly in your RAG structure.

This mapping can then be utilized by the consumer software by invoking the RetrieveAndGenerate API. The next is an instance implementation:

import json
import boto3

# Create a DynamoDB useful resource
dynamodb = boto3.useful resource('dynamodb')

# Create a Bedrock Runtime consumer
bedrock_runtime = boto3.consumer('bedrock-agent-runtime')

# Outline the desk title
table_name="tenantKbConfig"

# Outline perform returning tenant config
def get_tenant_config(tenant_id):
    desk = dynamodb.Desk(table_name)
    response = desk.get_item(
        Key = {
            'tenantId': tenant_id
        }
    )
if 'Merchandise' in response:
    return { 'knowledgebaseId':response['Item'].get('knowledgebaseId'), 'modelArn': response['Item'].get('modelArn')}
else:
    return None

# Retrieve the tenant configurations from DynamoDB

tenant_config = get_tenant_config('tenant_1')

#Invoke the Retrieve and Generate API
bedrock_runtime.retrieve_and_generate(
    enter = {
        'textual content': 'What kind of information do your paperwork include?'
    },
    retrieveAndGenerateConfiguration = {
        'kind': 'KNOWLEDGE_BASE',
        'knowledgeBaseConfiguration': {
            'knowledgeBaseId': tenant_config['knowledgebaseId'],
            'modelArn': tenant_config['modelArn']
        }
    }
)

Separate OpenSearch Service index per tenant – You retailer knowledge throughout the identical OpenSearch Serverless assortment, however you create a vector index per tenant. This suggests your tenants share the identical KMS encryption key and the identical pool of OCUs, optimizing the OpenSearch Service sources utilization for indexing and querying. The separation in vector indexes offers you the pliability of selecting totally different HNSM parameters per tenant, letting you tailor the efficiency of your k-NN indexing and querying in your totally different tenants.

The bridge sample helps as much as 100 tenants, and onboarding and offboarding a tenant requires the creation and deletion of a data base and OpenSearch Service vector index. To delete the info pertaining to a selected tenant, you’ll be able to delete the created sources and use the tenant-specific prefix as a logical parameter in your Amazon S3 API calls. In contrast to the silo sample, the bridge sample doesn’t enable for per-tenant end-to-end encryption; it provides the identical stage of tenant customization supplied by the silo sample whereas optimizing prices.

Abstract of variations

The next determine and desk present a consolidated view for evaluating the traits of the totally different multi-tenant RAG structure patterns. This complete overview highlights the important thing attributes and trade-offs related to the pool, bridge, and silo patterns, enabling knowledgeable decision-making based mostly on particular necessities.

The next determine illustrates the mapping of design traits to elements of the RAG structure.

The next desk summarizes the traits of the multi-tenant RAG structure patterns.

Attribute	Attribute of	Pool	Bridge	Silo
Per-tenant chunking technique	Amazon Bedrock Information Base Information Supply	No	Sure	Sure
Buyer managed key for encryption of transient knowledge and at relaxation	Amazon Bedrock Information Base Information Supply	No	No	Sure
Per-tenant distance measure	Amazon OpenSearch Service Index	No	Sure	Sure
Per-tenant ANN index configuration	Amazon OpenSearch Service Index	No	Sure	Sure
Per-tenant knowledge deletion insurance policies	Amazon Bedrock Information Base Information Supply	No	Sure	Sure
Per-tenant vector dimension	Amazon Bedrock Information Base Information Supply	No	Sure	Sure
Tenant efficiency isolation	Vector database	No	No	Sure
Tenant onboarding and offboarding complexity	Total answer	Easiest, requires administration of recent tenants in present infrastructure	Medium, requires minimal administration of end-to-end infrastructure	Hardest, requires administration of end-to-end infrastructure
Question consumer implementation	Authentic Information Supply	Medium, requires dynamic filtering	Hardest, requires exterior tenant mapping desk	Easiest, identical as single-tenant implementation
Amazon S3 tenant administration complexity	Amazon S3 buckets and objects	Hardest, want to keep up tenant particular metadata recordsdata for every object	Medium, every tenant wants a distinct S3 path	Easiest, every tenant requires a distinct S3 bucket
Value	Vector database	Lowest	Medium	Highest
Per-tenant FM used to create vector embeddings	Amazon Bedrock Information Base	No	Sure	Sure

Conclusion

This submit explored three distinct patterns for implementing a multi-tenant RAG structure utilizing Amazon Bedrock Information Bases and OpenSearch Service. The silo, pool, and bridge patterns supply various ranges of tenant isolation, variability, administration simplicity, and cost-efficiency, catering to totally different use circumstances and necessities. By understanding the trade-offs and concerns related to every sample, organizations could make knowledgeable selections and select the method that finest aligns with their wants.

Get began with Amazon Bedrock Knowledge Bases at present.

About the Authors

Emanuele Levi is a Options Architect within the Enterprise Software program and SaaS staff, based mostly in London. Emanuele helps UK prospects on their journey to refactor monolithic purposes into trendy microservices SaaS architectures. Emanuele is especially concerned with event-driven patterns and designs, particularly when utilized to analytics and AI, the place he has experience within the fraud-detection business.

Mehran Nikoo is a Generative AI Go-To-Market Specialist at AWS. He leads the generative AI go-to-market technique for UK and Eire.

Dani Mitchell is a Generative AI Specialist Options Architect at AWS. He’s centered on pc imaginative and prescient use case and helps AWS prospects in EMEA speed up their machine studying and generative AI journeys with Amazon SageMaker and Amazon Bedrock.