Streamline AWS useful resource troubleshooting with Amazon Bedrock Brokers and AWS Assist Automation Workflows

As AWS environments develop in complexity, troubleshooting points with sources can turn into a frightening job. Manually investigating and resolving issues could be time-consuming and error-prone, particularly when coping with intricate methods. Luckily, AWS offers a strong device known as AWS Support Automation Workflows, which is a set of curated AWS Systems Manager self-service automation runbooks. These runbooks are created by AWS Assist Engineering with greatest practices discovered from fixing buyer points. They permit AWS prospects to troubleshoot, diagnose, and remediate widespread points with their AWS sources.
Amazon Bedrock is a completely managed service that gives a alternative of high-performing basis fashions (FMs) from main AI firms like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon via a single API, together with a broad set of capabilities to construct generative AI functions with safety, privateness, and accountable AI. Utilizing Amazon Bedrock, you may experiment with and consider high FMs on your use case, privately customise them together with your information utilizing strategies comparable to fine-tuning and Retrieval Augmented Technology (RAG), and construct brokers that execute duties utilizing your enterprise methods and information sources. As a result of Amazon Bedrock is serverless, you don’t should handle infrastructure, and you’ll securely combine and deploy generative AI capabilities into your functions utilizing the AWS companies you might be already accustomed to.
On this put up, we discover the best way to use the facility of Amazon Bedrock Agents and AWS Assist Automation Workflows to create an clever agent able to troubleshooting points with AWS sources.
Resolution overview
Though the answer is flexible and could be tailored to make use of quite a lot of AWS Assist Automation Workflows, we deal with a particular instance: troubleshooting an Amazon Elastic Kubernetes Service (Amazon EKS) employee node that failed to affix a cluster. The next diagram offers a high-level overview of troubleshooting brokers with Amazon Bedrock.
Our resolution is constructed across the following key elements that work collectively to supply a seamless and environment friendly troubleshooting expertise:
- Amazon Bedrock Brokers – Amazon Bedrock Brokers acts because the clever interface between customers and AWS Assist Automation Workflows. It processes pure language queries to know the problem context and manages dialog stream to collect required info. The agent makes use of Anthropic’s Claude 3.5 Sonnet mannequin for superior reasoning and response technology, enabling pure interactions all through the troubleshooting course of.
- Amazon Bedrock agent motion teams – These motion teams outline the structured API operations that the Amazon Bedrock agent can invoke. Utilizing OpenAPI specs, they outline the interface between the agent and AWS Lambda features, specifying the obtainable operations, required parameters, and anticipated responses. Every motion group incorporates the API schema that tells the agent the best way to correctly format requests and interpret responses when interacting with Lambda features.
- Lambda Perform – The Lambda perform acts as the combination layer between the Amazon Bedrock agent and AWS Assist Automation Workflows. It validates enter parameters from the agent and initiates the suitable SAW runbook execution. It displays the automation progress whereas processing the technical output right into a structured format. When the workflow is full, it returns formatted outcomes again to the agent for consumer presentation.
- IAM position – The AWS Identity and Access Management (IAM) position offers the Lambda perform with the required permissions to execute AWS Assist Automation Workflows and work together with required AWS companies. This position follows the precept of least privilege to keep up safety greatest practices.
- AWS Assist Automation Workflows – These pre-built diagnostic runbooks are developed by AWS Assist Engineering. The workflows execute complete system checks based mostly on AWS greatest practices in a standardized, repeatable method. They cowl a variety of AWS companies and customary points, encapsulating AWS Assist’s intensive troubleshooting experience.
The next steps define the workflow of our resolution:
- Customers begin by describing their AWS useful resource difficulty in pure language via the Amazon Bedrock chat console. For instance, “Why isn’t my EKS employee node becoming a member of the cluster?”
- The Amazon Bedrock agent analyzes the consumer’s query and matches it to the suitable motion outlined in its OpenAPI schema. If important info is lacking, comparable to a cluster identify or occasion ID, the agent engages in a pure dialog to collect the required parameters. This makes certain that crucial information is collected earlier than continuing with the troubleshooting workflow.
- The Lambda perform receives the validated request and triggers the corresponding AWS Assist Automation Workflow. These SAW runbooks comprise complete diagnostic checks developed by AWS Assist Engineering to determine widespread points and their root causes. The checks run routinely with out requiring consumer intervention.
- The SAW runbook systematically executes its diagnostic checks and compiles the findings. These outcomes, together with recognized points and configuration issues, are structured in JSON format and returned to the Lambda perform.
- The Amazon Bedrock agent processes the diagnostic outcomes utilizing chain of thought (CoT) reasoning, based mostly on the ReAct (synergizing reasoning and performing) method. This permits the agent to investigate the technical findings, determine root causes, generate clear explanations, and supply step-by-step remediation steerage.
In the course of the reasoning part of the agent, the consumer is ready to view the reasoning steps.
Troubleshooting examples
Let’s take a better have a look at a typical difficulty we talked about earlier and the way our agent can help in troubleshooting it.
EKS employee node failed to affix EKS cluster
When an EKS employee node fails to affix an EKS cluster, our Amazon Bedrock agent could be invoked with the related info: cluster identify and employee node ID. The agent will execute the corresponding AWS Assist Automation Workflow, which is able to carry out checks like verifying the employee node’s IAM position permissions and verifying the required community connectivity.
The automation workflow will run all of the checks. Then Amazon Bedrock agent will ingest the troubleshooting, clarify the basis explanation for the problem to the consumer, and recommend remediation steps based mostly on the AWSSupport-TroubleshootEKSWorkerNode
output, comparable to updating the employee node’s IAM position or resolving community configuration points, enabling them to take the required actions to resolve the issue.
OpenAPI instance
Once you create an motion group in Amazon Bedrock, you should outline the parameters that the agent must invoke from the consumer. You too can outline API operations that the agent can invoke utilizing these parameters. To outline the API operations, we’ll create an OpenAPI schema in JSON:
"Body_troubleshoot_eks_worker_node_troubleshoot_eks_worker_node_post": {
"properties": {
"cluster_name": {
"kind": "string",
"title": "Cluster Identify",
"description": "The identify of the EKS cluster"
},
"worker_id": {
"kind": "string",
"title": "Employee Id",
"description": "The ID of the employee node"
}
},
"kind": "object",
"required": [
"cluster_name",
"worker_id"
],
"title": "Body_troubleshoot_eks_worker_node_troubleshoot_eks_worker_node_post"
}
The schema consists of the next elements:
- Body_troubleshoot_eks_worker_node_troubleshoot_eks_worker_node_post – That is the identify of the schema, which corresponds to the request physique for the
troubleshoot-eks-worker_node
POST endpoint. - Properties – This part defines the properties (fields) of the schema:
- “cluster_name” – This property represents the identify of the EKS cluster. It’s a string kind and has a title and outline.
- “worker_id” – This property represents the ID of the employee node. It’s also a string kind and has a title and outline.
- Sort – This property specifies that the schema is an “object” kind, that means it’s a assortment of key-value pairs.
- Required – This property lists the required fields for the schema, which on this case are “cluster_name” and “employee _id”. These fields should be supplied within the request physique.
- Title – This property offers a human-readable title for the schema, which can be utilized for documentation functions.
The OpenAPI schema defines the construction of the request physique. To study extra, see Define OpenAPI schemas for your agent’s action groups in Amazon Bedrock and OpenAPI specification.
Lambda perform code
Now let’s discover the Lambda perform code:
@app.put up("/troubleshoot-eks-worker-node")
@tracer.capture_method
def troubleshoot_eks_worker_node(
cluster_name: Annotated[str, Body(description="The name of the EKS cluster")],
worker_id: Annotated[str, Body(description="The ID of the worker node")]
) -> dict:
"""
Troubleshoot EKS employee node that failed to affix the cluster.
Args:
cluster_name (str): The identify of the EKS cluster.
worker_id (str): The ID of the employee node.
Returns:
dict: The output of the Automation execution.
"""
return execute_automation(
automation_name="AWSSupport-TroubleshootEKSWorkerNode",
parameters={
'ClusterName': [cluster_name],
'WorkerID': [worker_id]
},
execution_mode="TroubleshootWorkerNode"
)
The code consists of the next elements
- app.put up(“/troubleshoot-eks-worker-node”, description=”Troubleshoot EKS employee node failed to affix the cluster”) – It is a decorator that units up a route for a POST request to the
/troubleshoot-eks-worker-node
endpoint. The outline parameter offers a short rationalization of what this endpoint does. - @tracer.capture_method – That is one other decorator that’s doubtless used for tracing or monitoring functions, presumably as a part of an software efficiency monitoring (APM) device. It captures details about the execution of the perform, such because the length, errors, and different metrics.
- cluster_name: str = Physique(description=”The identify of the EKS cluster”), – This parameter specifies that the
cluster_name
is a string kind and is anticipated to be handed within the request physique. The Physique decorator is used to point that this parameter must be extracted from the request physique. The outline parameter offers a short rationalization of what this parameter represents. - worker_id: str = Physique(description=”The ID of the employee node”) – This parameter specifies that the
worker_id
is a string kind and is anticipated to be handed within the request physique. - -> Annotated[dict, Body(description=”The output of the Automation execution”)] – That is the return kind of the perform, which is a dictionary. The Annotated kind is used to supply further metadata concerning the return worth, particularly that it must be included within the response physique. The outline parameter offers a short rationalization of what the return worth represents.
To hyperlink a brand new SAW runbook within the Lambda perform, you may comply with the identical template.
Stipulations
Be sure to have the next conditions:
Deploy the answer
Full the next steps to deploy the answer:
- Clone the GitHub repository and go to the basis of your downloaded repository folder:
$ git clone https://github.com/aws-samples/sample-bedrock-agent-for-troubleshooting-aws-resources.git
$ cd bedrock-agent-for-troubleshooting-aws-resources
- Set up native dependencies:
$ npm set up
- Check in to your AWS account utilizing the AWS CLI by configuring your credential file (exchange <PROFILE_NAME> with the profile identify of your deployment AWS account):
$ export AWS_PROFILE=PROFILE_NAME
- Bootstrap the AWS CDK setting (it is a one-time exercise and isn’t wanted in case your AWS account is already bootstrapped):
$ cdk bootstrap
- Run the script to switch the placeholders on your AWS account and AWS Area within the config information:
$ cdk deploy --all
Take a look at the agent
Navigate to the Amazon Bedrock Brokers console in your Area and discover your deployed agent. You will see that the agent ID within the cdk deploy
command output.
Now you can work together with the agent and check troubleshooting a employee node not becoming a member of an EKS cluster. The next are some instance questions:
- I wish to troubleshoot why my Amazon EKS employee node isn’t becoming a member of the cluster. Are you able to assist me?
- Why this occasion <instance_ID> isn’t in a position to be a part of the EKS cluster <Cluster_Name>?
The next screenshot exhibits the console view of the agent.
The agent understood the query and mapped it with the fitting motion group. It additionally noticed that the parameters wanted are lacking within the consumer immediate. It got here again with a follow-up query to require the Amazon Elastic Compute Cloud (Amazon EC2) occasion ID and EKS cluster identify.
We will see the agent’s thought course of within the hint step 1. The agent assesses the following step as able to name the fitting Lambda perform and proper API path.
With the outcomes getting back from the runbook, the agent now opinions the troubleshooting end result. It goes via the knowledge and can begin writing the answer the place it offers the directions for the consumer to comply with.
Within the reply supplied, the agent was in a position to spot all the problems and remodel that into resolution steps. We will additionally see the agent mentioning the fitting info like IAM coverage and the required tag.
Clear up
When implementing Amazon Bedrock Brokers, there are not any further expenses for useful resource development. Nevertheless, prices are incurred for embedding mannequin and textual content mannequin invocations on Amazon Bedrock, with expenses based mostly on the pricing of every FM used. On this use case, additionally, you will incur prices for Lambda invocations.
To keep away from incurring future expenses, delete the created sources by the AWS CDK. From the basis of your repository folder, run the next command:
$ npm run cdk destroy --all
Conclusion
Amazon Bedrock Brokers and AWS Assist Automation Workflows are highly effective instruments that, when mixed, can revolutionize AWS useful resource troubleshooting. On this put up, we explored a serverless software constructed with the AWS CDK that demonstrates how these applied sciences could be built-in to create an clever troubleshooting agent. By defining motion teams throughout the Amazon Bedrock agent and associating them with particular situations and automation workflows, we’ve developed a extremely environment friendly course of for diagnosing and resolving points comparable to Amazon EKS employee node failures.
Our resolution showcases the potential for automating advanced troubleshooting duties, saving time and streamlining operations. Powered by Anthropic’s Claude 3.5 Sonnet, the agent demonstrates improved understanding and responding in languages apart from English, comparable to French, Japanese, and Spanish, making it accessible to international groups whereas sustaining its technical accuracy and effectiveness. The clever agent rapidly identifies root causes and offers actionable insights, whereas routinely executing related AWS Assist Automation Workflows. This method not solely minimizes downtime, but in addition scales successfully to accommodate varied AWS companies and use instances, making it a flexible basis for organizations trying to improve their AWS infrastructure administration.
Discover the AWS Assist Automation Workflow for extra use instances and think about using this resolution as a place to begin for constructing extra complete troubleshooting brokers tailor-made to your group’s wants. To study extra about utilizing brokers to orchestrate workflows, see Automate tasks in your application using conversational agents. For particulars about utilizing guardrails to safeguard your generative AI functions, consult with Stop harmful content in models using Amazon Bedrock Guardrails.
Comfortable coding!
Acknowledgements
The authors thank all of the reviewers for his or her beneficial suggestions.
Concerning the Authors
Wael Dimassi is a Technical Account Supervisor at AWS, constructing on his 7-year background as a Machine Studying specialist. He enjoys studying about AWS AI/ML companies and serving to prospects meet their enterprise outcomes by constructing options for them.
Marwen Benzarti is a Senior Cloud Assist Engineer at AWS Assist the place he focuses on Infrastructure as Code. With over 4 years at AWS and a pair of years of earlier expertise as a DevOps engineer, Marwen works intently with prospects to implement AWS greatest practices and troubleshoot advanced technical challenges. Outdoors of labor, he enjoys taking part in each aggressive multiplayer and immersive story-driven video video games.