Safeguard generative AI purposes with Amazon Bedrock Guardrails
Enterprises aiming to automate processes utilizing AI brokers or improve worker productiveness utilizing AI chat-based assistants must implement complete safeguards and audit controls for accountable use of AI and processing of delicate knowledge by large language models (LLMs). Many have developed a customized generative AI gateway or have adopted an off-the-shelf answer (akin to LiteLLM or Kong AI Gateway) to supply their AI practitioners and builders with entry to LLMs from totally different suppliers. Nevertheless, implementing and sustaining constant insurance policies for immediate security and delicate knowledge safety throughout a rising listing of LLMs from varied suppliers at scale is difficult.
On this submit, we display how one can tackle these challenges by including centralized safeguards to a customized multi-provider generative AI gateway utilizing Amazon Bedrock Guardrails. Amazon Bedrock Guardrails offers a collection of security options that assist organizations construct accountable generative AI purposes at scale. You’ll discover ways to use Amazon Bedrock ApplyGuardrail API to assist implement constant insurance policies for immediate security and delicate knowledge safety for LLMs from each Amazon Bedrock and third-party suppliers akin to Microsoft Azure OpenAI. The proposed answer offers extra advantages of central logging and monitoring, analytics, and a chargeback mechanism.
Answer overview
There are a number of necessities you want to meet to safeguard generative AI purposes with centralized guardrails. First, organizations want a sturdy and scalable infrastructure setup for the generative AI gateway and its guardrails parts. The answer additionally wants a complete logging and monitoring system to trace AI interactions and analytics capabilities to evaluate utilization patterns and compliance. For delicate knowledge safety, organizations want to determine clear knowledge governance insurance policies and implement applicable security controls. Moreover, they should develop or combine a chargeback mechanism to trace and allocate AI utilization prices throughout totally different departments or initiatives. Data of regulatory necessities particular to their trade is essential to verify the guardrails are correctly configured to fulfill compliance requirements.
The next diagram depicts a conceptual illustration of our proposed answer. The workflow begins when authenticated customers ship HTTPS requests to the generative AI gateway, a centralized utility working on Amazon Elastic Container Service (Amazon ECS) that serves as the first interface for the LLM interactions. Throughout the generative AI gateway utility logic, every incoming request is first forwarded to the Amazon Bedrock ApplyGuardrail API for content material screening. The generative AI gateway then evaluates the content material towards predefined configurations, making vital selections to both block the request solely, masks delicate data, or enable it to proceed unmodified.
This analysis course of, integral to the performance of the generative AI gateway, facilitates adherence to established security and compliance pointers. For requests that go this screening, the generative AI gateway logic determines the suitable LLM supplier (both Amazon Bedrock or a third-party service) based mostly on the person’s specs. The screened content material is then forwarded to the chosen LLM for processing. Lastly, the generative AI gateway receives the LLM’s response and returns it to the person, finishing the interplay cycle. The response circulate follows two distinct paths: blocked requests lead to customers receiving a blocked content material message, and permitted requests ship the mannequin’s response with the required content material masking utilized to the person immediate. In our implementation, guardrails are solely utilized to the enter or immediate and to not the LLM responses. This streamlined course of offers a unified method to LLM entry, safety, and compliance for each Amazon Bedrock and third-party suppliers.

The generative AI gateway utility is hosted on AWS Fargate, and it’s constructed utilizing FastAPI. The applying interacts with different Amazon Web Services (AWS) companies akin to Amazon Simple Storage Service (Amazon S3), Amazon Bedrock, Amazon Kinesis and Amazon Data Firehose. The answer features a sturdy knowledge persistence layer that captures the interplay particulars and shops them on Amazon S3 by means of Amazon Kinesis Data Streams and Amazon Information Firehose. Information endured consists of sanitized requests and responses, transaction data, guardrail metadata, and blocked content material with related metadata. This complete logging facilitates full auditability and allows steady enchancment of the guardrail mechanisms.
Answer parts
Scalability of the answer is achieved utilizing the next instruments and applied sciences:
- nginx to supply most efficiency and stability of the appliance by load balancing requests inside every container.
- Gunicorn, a Python Internet Server Gateway Interface (WSGI) HTTP server generally used to serve Python net purposes in manufacturing environments. It’s a high-performance server that may deal with a number of employee processes and concurrent requests effectively. Gunicorn helps synchronous communications solely however has sturdy course of administration performance.
- Uvicorn to supply light-weight and asynchronous request dealing with. Though Gunicorn is synchronous, it helps utilizing asynchronous employee varieties akin to Uvicorn, with which asynchronous communication could be established. That is wanted for purposes with longer wait instances. In case of fetching responses from LLMs, you need to anticipate greater wait instances.
- FastAPI to serve the precise requests on the generative AI gateway utility layer.
- Amazon ECS Fargate cluster to host the containerized utility on AWS, and AWS Auto Scaling to scale up or down the duties or containers routinely.
- Amazon Elastic Container Registry (Amazon ECR) for storing the Docker picture of the generative AI gateway utility.
- Elastic Load Balancing (ELB) and Application Load Balancer for load balancing of requests throughout ECS containers.
- HashiCorp Terraform for useful resource provisioning.
The next determine illustrates the structure design of the proposed answer. Client purposes (akin to on-premises enterprise app, inference app, Streamlit app, and Amazon SageMaker Studio Lab), dashboard, and Azure Cloud parts aren’t included within the accompanying GitHub repository. They’re included within the structure diagram to display integrations with downstream and upstream programs.

Centralized guardrails
The generative AI gateway enforces complete safety controls by means of Amazon Bedrock Guardrails, utilizing the ApplyGuardrail API to implement a number of layers of safety. These guardrails present 4 core security options: content material filtering to display inappropriate or dangerous content material, denied matters to assist forestall particular subject material discussions, phrase filters to dam particular phrases or phrases, and delicate data detection to assist shield private and confidential knowledge.
Organizations can implement these controls utilizing three configurable power stage—low, medium, and excessive. This fashion, enterprise items can align their AI safety posture with their particular danger tolerance and compliance necessities. For instance, a advertising group may function with low-strength guardrails for inventive content material era, whereas monetary or healthcare divisions may require high-strength guardrails for dealing with delicate buyer knowledge. Past these fundamental protections, Amazon Bedrock Guardrails additionally consists of superior options akin to contextual grounding and automatic reasoning checks, which assist detect and stop AI hallucinations (cases the place fashions generate false or deceptive data). Customers can lengthen the functionalities of the generative AI gateway to assist these superior options based mostly on their use case.
Multi-provider integration
The generative AI gateway is each LLM supplier and model-agnostic, which allows seamless integration with a number of suppliers and LLMs. Customers can specify their most popular LLM mannequin straight within the request payload, permitting the gateway to route requests to the suitable mannequin endpoint. AWS Secrets Manager is used for storing the generative AI gateway API entry tokens and entry tokens from third-party LLMs akin to Azure OpenAI. The generative AI gateway API token is used for authenticating the caller. The LLM entry token is used for establishing consumer connection for third-party suppliers.
Logging, monitoring and alerting
A key benefit of implementing a generative AI gateway is its centralized method to logging and monitoring the LLM interactions. Each interplay, together with person requests and prompts, LLM responses, and person context, is captured and saved in a standardized format and placement. Organizations can use this assortment technique to carry out evaluation, troubleshoot points, and derive insights. Logging, monitoring, and alerting is enabled utilizing the next AWS companies:
- Amazon CloudWatch captures the container and utility logs. We will create customized metrics on particular log messages and create an alarm that can be utilized for proactive alerting (for instance, when a 500 Inner Server Error happens)
- Amazon Simple Notification Service (Amazon SNS) for notification to a distribution listing (for instance, when a 500 Inner Server Error occurs)
- Kinesis Information Streams and Information Firehose for streaming request and response knowledge and metadata to Amazon S3 (for compliance and analytics or chargeback). Chargeback is a mechanism to attribute prices to a hierarchy of householders. For example, an utility working on AWS would incur some prices for each service, nevertheless the appliance might be serving an worker working for a challenge ruled by a enterprise unit. Chargeback is a course of the place prices could be attributed to the bottom stage of a person person with potential to roll up at a number of intermediate ranges all the way in which to the enterprise unit.
- Amazon S3 for persisting requests and responses on the transaction stage (for compliance), along with transaction metadata and metrics (for instance, token counts) for analytics and chargeback.
- AWS Glue Crawler API and Amazon Athena for exposing a SQL desk of transaction metadata for analytics and chargeback.
Repository construction
The GitHub repository incorporates the next directories and recordsdata:
Stipulations
You want the next conditions earlier than deploying this answer:
- An AWS Account
- An AWS Identity and Access Management (IAM) position with the next permissions:
- Amazon S3 entry (CreateBucket, PutObject, GetObject, DeleteObject)
- AWS Secrets and techniques Supervisor entry
- Amazon CloudWatch logs entry
- Amazon Bedrock service
- Amazon Bedrock basis mannequin (FM) entry
- Amazon Bedrock Guardrails IAM permissions
- IAM permissions for Amazon Bedrock Guardrails:
- Entry to the serverless FMs on Amazon Bedrock are routinely enabled. You don’t must manually request or allow mannequin entry, however you should utilize IAM policies and service control policies to limit mannequin entry as wanted.
- Exterior LLM endpoints configured within the buyer surroundings. For instance, Azure OpenAI endpoints have to be created within the buyer Azure account with the next naming conference:
{model_name}-{azure_tier}-{azure_region}. For instance,{gpt-4o}-{dev}-{eastus}.
Deploy the answer
Within the deployment information offered on this part, we assumed that deployment directions embrace steps for dev surroundings. Related steps can be utilized for greater environments.
To safeguard generative AI purposes with centralized guardrails, observe these steps:
- Clone the GitHub repository and ensure surroundings variables for AWS authentication can be found in your surroundings.
- Execute ./deploy.sh, which routinely units up a Terraform state bucket, creates an IAM coverage for Terraform, and provisions the infrastructure with dependencies.
- Invoke ./verify.sh to confirm the deployment and ensure the surroundings is prepared for testing.
- Observe the directions within the README, Auth Token Generation for Consumers, to generate client authorization tokens.
- Observe the directions within the README, Testing the Gateway, to check your individual generative AI gateway.
For improvement and testing, all the setup could be accomplished on the developer laptop computer with the generative AI gateway server and the consumer working on the person laptop computer by following the local setup instructions within the README.
Examples
On this first instance, the next code pattern is a curl command to invoke anthropic.claude-3-sonnet-20240229-v1:0 mannequin with a excessive power guardrail to display how the generative AI gateway guardrails carry out towards denied matters. The primary instance illustrates the effectiveness of the protection mechanism in blocking denied matters by asking the mannequin, I wish to promote my home and make investments the proceeds in a single inventory. Which inventory ought to I purchase?:
The next pattern code is the output from the previous curl command. This consequence consists of the mannequin’s generated textual content and modifications or interventions utilized by the high-strength guardrails. Analyzing this output helps confirm the effectiveness of the guardrails and makes certain that the mannequin’s response aligns with the required security and compliance parameters:
The second instance checks the power of the generative AI gateway to assist shield delicate private data. It simulates a person question containing personally identifiable data (PII) akin to a reputation, Social Safety quantity, and electronic mail tackle.
On this case, the guardrail efficiently intervened and masked PII knowledge earlier than sending the person question to the LLM, as evidenced by the guardrail_action discipline, indicating the sensitiveInformationPolicy was utilized:
For extra complete check scripts, please check with the /check listing of the repository. These extra scripts supply a wider vary of check instances and situations to completely consider the performance and efficiency of the generative AI gateway.
Clear up
Upon concluding your exploration of this answer, you possibly can clear up the assets by following these steps:
- Make use of the
terraform destroyto delete the assets provisioned by Terraform. - (Non-obligatory) From the AWS Management Console or AWS Command Line Interface (AWS CLI), delete assets that aren’t deleted by Terraform (such because the S3 bucket, ECR repository, and EC2 subnet).
Value estimation
This part describes the underlying price construction for working the answer. When implementing this answer, there are a number of price classes to be thought of:
- LLM supplier prices – These symbolize the fees for utilizing basis fashions by means of varied suppliers, together with fashions hosted on Amazon Bedrock and third-party suppliers. Prices are usually calculated based mostly on:
- Variety of enter and output tokens processed
- Mannequin complexity and capabilities
- Utilization quantity and patterns
- Service stage necessities
- AWS infrastructure prices – These embody the infrastructure bills related to generative AI gateway:
- Compute assets (Amazon ECS Fargate)
- Load balancing (Software Load Balancer)
- Storage (Amazon S3, Amazon ECR)
- Monitoring (Amazon CloudWatch)
- Information processing (Amazon Kinesis)
- Safety companies (AWS Secrets and techniques Supervisor)
- Amazon Bedrock Guardrails prices – These are particular costs for implementing security and compliance options:
- Content material filtering and moderation
- Coverage enforcement
- Delicate knowledge safety
The next tables present a pattern price breakdown for deploying and utilizing generative AI gateway. For precise pricing, check with the AWS Pricing Calculator.
Infrastructure prices:
| Service | Estimated utilization | Estimated month-to-month price |
| Amazon ECS Fargate | 2 duties, 1 vCPU, 2 GB RAM, working consistently | $70–$100 |
| Software Load Balancer | 1 ALB, working consistently | $20–$30 |
| Amazon ECR | Storage for Docker pictures | $1–$5 |
| AWS Secrets and techniques Supervisor | Storing API keys and tokens | $0.40 per secret monthly |
| Amazon CloudWatch | Log storage and metrics | $10–$20 |
| Amazon SNS | Notifications | $1–$2 |
| Amazon Kinesis Information Streams | 1 stream, low quantity | $15–$25 |
| Amazon Information Firehose | 1 supply stream | $0.029 per GB processed |
| Amazon S3 | Storage for logs and knowledge | $2–$5 |
| AWS Glue | Crawler runs (assuming weekly) | $5–$10 |
| Amazon Athena | Question execution | $1–$5 |
LLM and guardrails prices:
| Service | Estimated utilization | Estimated month-to-month price |
| Amazon Bedrock Guardrails | 10,000 API calls monthly | $10–$20 |
| Claude 3 Sonnet (Enter) | 1M tokens monthly at $0.003 per 1K tokens | $3 |
| Claude 3 Sonnet (Output) | 500K tokens monthly at $0.015 per 1K tokens | $7.50 |
| GPT-4 Turbo (Azure OpenAI) | 1M tokens monthly at $0.01 per 1K tokens | $10 |
| GPT-4 Turbo Output | 500K tokens monthly at $0.03 per 1K tokens | $15 |
| Whole estimated price | $170–$260 (Base) |
LLM prices can differ considerably based mostly on the variety of API calls, enter/output token lengths, mannequin choice, and quantity reductions. We contemplate a reasonable utilization state of affairs to be about 50–200 queries per day, with a mean enter size of 500 tokens and common output size of 250 tokens. These prices may enhance considerably with greater question volumes, longer conversations, use of costlier fashions, and a number of mannequin calls per request.
Conclusion
The centralized guardrails built-in with a customized multi-provider generative AI gateway answer gives a sturdy and scalable method for enterprises to soundly use LLMs whereas sustaining safety and compliance requirements. By means of its implementation of Amazon Bedrock Guardrails ApplyGuardrail API, the answer offers constant coverage enforcement for immediate security and delicate knowledge safety throughout each Amazon Bedrock and third-party LLM suppliers.
Key benefits of this answer embrace:
- Centralized guardrails with configurable safety ranges
- Multi-provider LLM integration capabilities
- Complete logging and monitoring options
- Manufacturing-grade scalability by means of containerization
- Constructed-in compliance and audit capabilities
Organizations, significantly these in extremely regulated industries, can use this structure to undertake and scale their generative AI implementations whereas sustaining management over knowledge safety and AI security laws. The answer’s versatile design and sturdy infrastructure make it a priceless instrument for enterprises that wish to safely harness the ability of generative AI whereas managing related risks.
In regards to the authors
Hasan Shojaei Ph.D., is a Sr. Information Scientist with AWS Skilled Providers, the place he helps prospects throughout totally different industries akin to sports activities, monetary companies, and manufacturing clear up their enterprise challenges utilizing superior AI/ML applied sciences. Exterior of labor, Hasan is captivated with books, images, and snowboarding.
Sunita Koppar is a Senior Specialist Options Architect in Generative AI and Machine Studying at AWS, the place she companions with prospects throughout numerous industries to design options, construct proof-of-concepts, and drive measurable enterprise outcomes. Past her skilled position, she is deeply captivated with studying and educating Sanskrit, actively partaking with scholar communities to assist them upskill and develop.
Anuja Narwadkar is a World Senior Engagement Supervisor in AWS Skilled Providers, specializing in enterprise-scale Machine Studying and GenAI transformations. She leads ProServe groups in strategizing, structure, and constructing transformative AI/ML options on AWS for big enterprises throughout industries, together with monetary companies. Past her skilled position, she likes to drive AI up-skill initiatives particularly for girls, learn and prepare dinner.
Krishnan Gopalakrishnan is a Supply Guide at AWS Skilled Providers with 12+ years in Enterprise Information Structure and AI/ML Engineering. He architects cutting-edge knowledge options for Fortune 500 firms, constructing mission-critical pipelines and Generative AI implementations throughout retail, healthcare, fintech, and manufacturing. Krishnan focuses on scalable, cloud-native architectures that rework enterprise knowledge into actionable AI-powered insights, enabling measurable enterprise outcomes by means of data-driven choice making.
Bommi Shin is a Supply Guide with AWS Skilled Providers, the place she helps enterprise prospects implement safe, scalable synthetic intelligence options utilizing cloud applied sciences. She focuses on designing and constructing AI/ML and Generative AI platforms that tackle complicated enterprise challenges throughout a spread of industries. Exterior of labor, she enjoys touring, exploring nature, and scrumptious meals.