Construct RAG-based generative AI functions in AWS utilizing Amazon FSx for NetApp ONTAP with Amazon Bedrock


The submit is co-written with Michael Shaul and Sasha Korman from NetApp.

Generative artificial intelligence (AI) functions are generally constructed utilizing a way referred to as Retrieval Augmented Generation (RAG) that gives basis fashions (FMs) entry to extra information they didn’t have throughout coaching. This information is used to complement the generative AI immediate to ship extra context-specific and correct responses with out constantly retraining the FM, whereas additionally enhancing transparency and minimizing hallucinations.

On this submit, we show an answer utilizing Amazon FSx for NetApp ONTAP with Amazon Bedrock to offer a RAG expertise to your generative AI functions on AWS by bringing company-specific, unstructured consumer file information to Amazon Bedrock in an easy, quick, and safe approach.

Our answer makes use of an FSx for ONTAP file system because the supply of unstructured information and constantly populates an Amazon OpenSearch Serverless vector database with the consumer’s present recordsdata and folders and related metadata. This permits a RAG situation with Amazon Bedrock by enriching the generative AI immediate utilizing Amazon Bedrock APIs along with your company-specific information retrieved from the OpenSearch Serverless vector database.

When growing generative AI functions similar to a Q&A chatbot utilizing RAG, prospects are additionally involved about retaining their information safe and stopping end-users from querying info from unauthorized information sources. Our answer additionally makes use of FSx for ONTAP to permit customers to increase their present information safety and entry mechanisms to enhance mannequin responses from Amazon Bedrock. We use FSx for ONTAP because the supply of related metadata, particularly the consumer’s safety entry management listing (ACL) configurations hooked up to their recordsdata and folders and populate that metadata into OpenSearch Serverless. By combining entry management operations with file occasions that notify the RAG software of recent and altered information on the file system, our answer demonstrates how FSx for ONTAP allows Amazon Bedrock to solely use embeddings from licensed recordsdata for the particular customers that hook up with our generative AI software.

AWS serverless providers make it easy to concentrate on constructing generative AI functions by offering computerized scaling, built-in excessive availability, and a pay-for-use billing mannequin. Occasion-driven compute with AWS Lambda is an effective match for compute-intensive, on-demand duties similar to doc embedding and versatile giant language mannequin (LLM) orchestration, and Amazon API Gateway offers an API interface that permits for pluggable frontends and event-driven invocation of the LLMs. Our answer additionally demonstrates the way to construct a scalable, automated, API-driven serverless software layer on high of Amazon Bedrock and FSx for ONTAP utilizing API Gateway and Lambda.

Answer overview

The answer provisions an FSx for ONTAP Multi-AZ file system with a storage virtual machine (SVM) joined to an AWS Managed Microsoft AD area. An OpenSearch Serverless vector search assortment offers a scalable and high-performance similarity search functionality. We use an Amazon Elastic Compute Cloud (Amazon EC2) Home windows server as an SMB/CIFS shopper to the FSx for ONTAP quantity and configure information sharing and ACLs for the SMB shares within the quantity. We use this information and ACLs to check permissions-based entry to the embeddings in a RAG situation with Amazon Bedrock.

The embeddings container part of our answer is deployed on an EC2 Linux server and mounted as an NFS shopper on the FSx for ONTAP quantity. It periodically migrates present recordsdata and folders together with their safety ACL configurations to OpenSearch Serverless. It populates an index within the OpenSearch Serverless vector search assortment with company-specific information (and related metadata and ACLs) from the NFS share on the FSx for ONTAP file system.

The answer implements a RAG Retrieval Lambda perform that permits RAG with Amazon Bedrock by enriching the generative AI immediate utilizing Amazon Bedrock APIs along with your company-specific information and related metadata (together with ACLs) retrieved from the OpenSearch Serverless index that was populated by the embeddings container part. The RAG Retrieval Lambda perform shops dialog historical past for the consumer interplay in an Amazon DynamoDB desk.

Finish-users work together with the answer by submitting a pure language immediate both via a chatbot software or straight via the API Gateway interface. The chatbot software container is constructed utilizing Streamlit and fronted by an AWS Application Load Balancer (ALB). When a consumer submits a pure language immediate to the chatbot UI utilizing the ALB, the chatbot container interacts with the API Gateway interface that then invokes the RAG Retrieval Lambda perform to fetch the response for the consumer. The consumer also can straight submit immediate requests to API Gateway and procure a response. We show permissions-based entry to the RAG paperwork by explicitly retrieving the SID of a consumer after which utilizing that SID within the chatbot or API Gateway request, the place the RAG Retrieval Lambda perform then matches the SID to the Home windows ACLs configured for the doc. As a further authentication step in a manufacturing atmosphere, chances are you’ll wish to additionally authenticate the consumer towards an id supplier after which match the consumer towards the permissions configured for the paperwork.

The next diagram illustrates the end-to-end circulation for our answer. We begin by configuring information sharing and ACLs with FSx for ONTAP, after which these are periodically scanned by the embeddings container. The embeddings container splits the paperwork into chunks and makes use of the Amazon Titan Embeddings mannequin to create vector embeddings from these chunks. It then shops these vector embeddings with related metadata in our vector database by populating an index in a vector assortment in OpenSearch Serverless. The next diagram illustrates the end-to-end circulation.

end to end embedding flow for the fsxontap and bedrock integration

The next structure diagram illustrates the varied parts of our answer.overall architecture diagram describing all the components of the solution

Stipulations

Full the next prerequisite steps:

  1. Be sure you have model access in Amazon Bedrock. On this answer, we use Anthropic Claude v3 Sonnet on Amazon Bedrock.
  2. Set up the AWS Command Line Interface (AWS CLI).
  3. Install Docker.
  4. Install Terraform.

Deploy the answer

The answer is on the market for obtain on this GitHub repo. Cloning the repository and utilizing the Terraform template will provision all of the parts with their required configurations.

  1. Clone the repository for this answer:
    sudo yum set up -y unzip
    git clone https://github.com/aws-samples/genai-bedrock-fsxontap.git
    cd genai-bedrock-fsxontap/terraform

  2. From the terraform folder, deploy your entire answer utilizing Terraform:
    terraform init
    terraform apply -auto-approve

This course of can take 15–20 minutes to finish. When completed, the output of the terraform instructions ought to appear to be the next:

api-invoke-url = "https://9ng1jjn8qi.execute-api.<area>.amazonaws.com/prod"
fsx-management-ip = toset([
"198.19.255.230",])
fsx-secret-id = "arn:aws:secretsmanager:<area>:<account-id>:secret:AmazonBedrock-FSx-NetAPP-ONTAP-a2fZEdIt-0fBcS9"
fsx-svm-smb-dns-name = "BRSVM.BEDROCK-01.COM"
lb-dns-name = "chat-load-balancer-2040177936.<area>.elb.amazonaws.com"

Load information and set permissions

To check the answer, we are going to use the EC2 Home windows server (ad_host) mounted as an SMB/CIFS shopper to the FSx for ONTAP quantity to share pattern information and set consumer permissions that can then be used to populate the OpenSearch Serverless index by the answer’s embedding container part. Carry out the next steps to mount your FSx for ONTAP SVM information quantity as a community drive, add information to this shared community drive, and set permissions based mostly on Home windows ACLs:

  1. Acquire the ad_host occasion DNS from the output of your Terraform template.
  2. Navigate to AWS Systems Manager Fleet Manager on your AWS console, find the ad_host occasion and follow instructions here to login with Remote Desktop. Use the area admin consumer bedrock-01Admin and procure the password from AWS Secrets Manager. You will discover the password utilizing the Secrets and techniques Supervisor fsx-secret-id secret id from the output of your Terraform template.
  3. To mount an FSx for ONTAP information quantity as a community drive, below This PC, select (right-click) Community after which select Map Community drive.
  4. Select the drive letter and use the FSx for ONTAP share path for the mount
    (<svm>.<area >c$<volume-name>):
    map network drive
  5. Add the Amazon Bedrock User Guide to the shared community drive and set permissions to the admin consumer solely (just remember to disable inheritance below Superior):upload the amazon bedrock user guide
  6. Add the Amazon FSx for ONTAP User Guide to the shared drive and ensure permissions are set to Everybody:upload the amazon fsx ontap media guide
  7. On the ad_host server, open the command immediate and enter the next command to acquire the SID for the admin consumer:
    wmic useraccount the place identify="Admin" get sid

Check permissions utilizing the chatbot

To check permissions utilizing the chatbot, get hold of the lb-dns-name URL from the output of your Terraform template and entry it via your internet browser:

test with chatbot and enter prompt

For the immediate question, ask any basic query on the FSx for ONTAP consumer information that’s obtainable for entry to everybody. In our situation, we requested “How can I create an FSx for ONTAP file system,” and the mannequin replied again with detailed steps and supply attribution within the chat window to create an FSx for ONTAP file system utilizing the AWS Administration Console, AWS CLI, or FSx API:

test with chatbot and enter prompt related to the bedrock guide

Now, let’s ask a query in regards to the Amazon Bedrock consumer information that’s obtainable for admin entry solely. In our situation, we requested “How do I take advantage of basis fashions with Amazon Bedrock,” and the mannequin replied with the response that it doesn’t have sufficient info to offer an in depth reply to the query.:

Use the admin SID on the consumer (SID) filter search within the chat UI and ask the identical query within the immediate. This time, the mannequin ought to reply with steps detailing the way to use FMs with Amazon Bedrock and supply the supply attribution utilized by the mannequin for the response:

Check permissions utilizing API Gateway

You may as well question the mannequin straight utilizing API Gateway. Acquire the api-invoke-url parameter from the output of your Terraform template.

curl -v '<api-invoke-url>/bedrock_rag_retreival' -X POST -H 'content-type: software/json' -d '{"session_id": "1","immediate": "What's an FSxN ONTAP filesystem?", "bedrock_model_id": "anthropic.claude-3-sonnet-20240229-v1:0", "model_kwargs": {"temperature": 1.0, "top_p": 1.0, "top_k": 500}, "metadata": "NA", "memory_window": 10}'

Then invoke the API gateway with Everybody entry for a question associated to the FSx for ONTAP consumer information by setting the worth of the metadata parameter to NA to point Everybody entry:

curl -v '<api-invoke-url>/bedrock_rag_retreival' -X POST -H 'content-type: software/json' -d '{"session_id": "1","immediate": "what's bedrock?", "bedrock_model_id": "anthropic.claude-3-sonnet-20240229-v1:0", "model_kwargs": {"temperature": 1.0, "top_p": 1.0, "top_k": 500}, "metadata": "S-1-5-21-4037439088-1296877785-2872080499-1112", "memory_window": 10}'

Cleanup

To keep away from recurring fees, clear up your account after making an attempt the answer. From the terraform folder, delete the Terraform template for the answer:

terraform apply --destroy

Conclusion

On this submit, we demonstrated an answer that makes use of FSx for ONTAP with Amazon Bedrock and makes use of FSx for ONTAP help for file possession and ACLs to offer permissions-based entry in a RAG situation for generative AI functions. Our answer lets you construct generative AI functions with Amazon Bedrock the place you may enrich the generative AI immediate in Amazon Bedrock along with your company-specific, unstructured consumer file information from an FSx for ONTAP file system. This answer lets you ship extra related, context-specific, and correct responses whereas additionally ensuring solely licensed customers have entry to that information. Lastly, the answer demonstrates the usage of AWS serverless providers with FSx for ONTAP and Amazon Bedrock that allow computerized scaling, event-driven compute, and API interfaces to your generative AI functions on AWS.

For extra details about the way to get began constructing with Amazon Bedrock and FSx for ONTAP, check with the next sources:


In regards to the authors

Kanishk Mahajan is Principal, Options Structure at AWS. He leads cloud transformation and answer structure for ISV prospects and companion at AWS. Kanishk focuses on containers, cloud operations, migrations and modernizations, AI/ML, resilience and safety and compliance. He’s a Technical Area Neighborhood (TFC) member in every of these domains at AWS.

Michael Shaul is a Principal Architect at NetApp’s workplace of the CTO. He has over 20 years of expertise constructing information administration programs, functions, and infrastructure options. He has a singular in-depth perspective on cloud applied sciences, builder, and AI options.

Sasha Korman is a tech visionary chief of dynamic growth and QA groups throughout Israel and India. With 14-years at NetApp that started as a programmer, his hands-on expertise and management have been pivotal in steering advanced tasks to success, with a concentrate on innovation, scalability, and reliability.

Leave a Reply

Your email address will not be published. Required fields are marked *