Claude Code deployment patterns and finest practices with Amazon Bedrock
Claude Code is an AI-powered coding assistant from Anthropic that helps builders write, evaluation, and modify code via pure language interactions. Amazon Bedrock is a completely managed service that gives entry to basis fashions from main AI firms via a single API. This submit reveals you the best way to deploy Claude Code with Amazon Bedrock. You’ll be taught authentication strategies, infrastructure choices, and monitoring methods to deploy securely at enterprise scale.
Suggestions for many enterprises
We advocate the Guidance for Claude Code with Amazon Bedrock, which implements confirmed patterns that may be deployed in hours.
Deploy Claude Code with this confirmed stack:
This structure gives safe entry with consumer attribution, capability administration, and visibility into prices and developer productiveness.
Authentication strategies
Claude Code deployments start with authenticating to Amazon Bedrock. The authentication determination impacts downstream safety, monitoring, operations, and developer expertise.
Authentication strategies comparability
| Characteristic | API Keys | AWS log in | SSO with IAM Identification Middle | Direct IdP Integration |
| Session length | Indefinite | Configurable (as much as 12 hours) | Configurable (as much as 12 hours) | Configurable (as much as 12 hours) |
| Setup time | Minutes | Minutes | Hours | Hours |
| Safety threat | Excessive | Low | Low | Low |
| Consumer attribution | None | Fundamental | Fundamental | Full |
| MFA help | No | Sure | Sure | Sure |
| OpenTelemetry integration | None | Restricted | Restricted | Full |
| Value allocation | None | Restricted | Restricted | Full |
| Operation overhead | Excessive | Medium | Medium | Low |
| Use case | Brief time period testing | Testing and restricted deployments | Fast SSO deployment | Manufacturing deployment |
The next will focus on the trade-offs and implementation issues specified by the above desk.
API keys
Amazon Bedrock helps API keys because the quickest path to proof-of-concept. Each short-term (12-hour) and long-term (indefinite) keys will be generated via the AWS Management Console, AWS CLI, or SDKs.
Nevertheless, API keys create safety vulnerabilities via persistent entry with out MFA, handbook distribution necessities, and threat of repository commits. They supply no consumer attribution for price allocation or monitoring. Use just for short-term testing (< 1 week, 12-hour expiration).
AWS log in
The aws login command makes use of your AWS Administration Console credentials for Amazon Bedrock entry via a browser-based authentication movement. It helps fast setup with out API keys and is really useful for testing and small deployments.
Single Signal-On (SSO)
AWS IAM Identity Center integrates with present enterprise id suppliers via OpenID Connect (OIDC), an authentication protocol that allows single sign-on by permitting id suppliers to confirm consumer identities and share authentication data with purposes. This integration permits builders to make use of company credentials to entry Amazon Bedrock with out distributing API keys.
Builders authenticate with AWS IAM Identification Middle utilizing the aws sso login command, which generates non permanent credentials with configurable session durations. These credentials robotically refresh, lowering the operational overhead of credential administration whereas bettering safety via non permanent, time-limited entry.
Organizations utilizing IAM Identification Middle for AWS entry can lengthen this sample to Claude Code. Nevertheless, it limits detailed user-level monitoring by not exposing OIDC JWT tokens for OpenTelemetry attribute extraction.
This authentication methodology fits organizations that prioritize fast SSO deployment over detailed monitoring or preliminary rollouts the place complete metrics aren’t but required.
Direct idP integration
Direct OIDC federation together with your id supplier (Okta, Azure AD, Auth0, or AWS Cognito User Pools) is really useful for manufacturing Claude Code deployments. This strategy connects your enterprise id supplier on to AWS IAM to generate non permanent credentials with full consumer context for monitoring.
The process credential provider orchestrates the OAuth2 authentication with PKCE, a safety extension that helps forestall authorization code interception. Builders authenticate of their browser, exchanging OIDC tokens for AWS non permanent credentials.
A helper script makes use of AWS Security Token Service (STS) AssumeRoleWithWebIdentity to imagine a job with credentials to InvokeModel and InvokeModelWithStreaming to make use of Amazon Bedrock. Direct IAM federation helps session durations as much as 12 hours and the JWT token stays accessible all through the session, enabling monitoring via OpenTelemetry to trace consumer attributes like e mail, division, and staff.
The Guidance for Claude Code with Amazon Bedrock implements each Cognito Identification Pool and Direct IAM federation patterns, however recommends Direct IAM for simplicity. The answer gives an interactive setup wizard that configures your OIDC supplier integration, deploys the required IAM infrastructure, and builds distribution packages for Home windows, macOS, and Linux.
Builders obtain set up packages that configure their AWS CLI profile to make use of the credential course of. Authentication happens via company credentials, with automated browser opening to refresh credentials. The credential course of handles token caching, credential refresh, and error restoration.
For organizations requiring detailed utilization monitoring, price attribution by developer, and complete audit trails, direct IdP integration via IAM federation gives the inspiration for superior monitoring capabilities mentioned later on this submit.
Organizational choices
Past authentication, architectural choices form how Claude Code integrates together with your AWS infrastructure. These selections have an effect on operational complexity, price administration, and enforcement of utilization insurance policies.
Public endpoints
Amazon Bedrock gives managed, public API endpoints in a number of AWS Areas with minimal operational overhead. AWS manages infrastructure, scaling, availability, and safety patching. Builders use customary AWS credentials via AWS CLI profiles or atmosphere variables. Mixed with OpenTelemetry metrics from Direct IdP integration, you possibly can monitor utilization via public endpoints by particular person developer, division, or price heart and will be enforced on the AWS IAM stage. For instance, implementing per-developer fee limiting requires infrastructure that observes CloudWatch metrics or CloudTrail logs and takes automated motion. Organizations requiring instant, request-level blocking based mostly on customized enterprise logic may have extra elements akin to an LLM (Giant Language Mannequin) gateway sample. Public Amazon Bedrock endpoints are ample for many organizations as they supply a steadiness of simplicity, AWS managed reliability, price alerting, and acceptable management mechanisms.
LLM gateway
An LLM gateway introduces an middleman software layer between builders and Amazon Bedrock, routing requests via customized infrastructure. The Guidance for Multi-Provider Generative AI Gateway on AWS describes this sample, deploying a containerized proxy service with load balancing and centralized credential administration.
This structure is finest for:
- Multi-provider help: Routing between Amazon Bedrock, OpenAI, and Azure OpenAI based mostly on availability, price, or functionality
- Customized middleware: Proprietary immediate engineering, content material filtering, or immediate injection detection on the request stage
- Request-level coverage enforcement: Speedy blocking of requests exceeding customized enterprise logic past IAM capabilities
Gateways present unified APIs and real-time monitoring however add operational overhead: Amazon Elastic Container Service (Amazon ECS)/Amazon Elastic Kubernetes Service (Amazon EKS) infrastructure, Elastic Load Balancing (ELB) Application Load Balancers, Amazon ElastiCache, Amazon Relational Database Service (Amazon RDS) administration, elevated latency, and a brand new failure mode the place gateway points block Claude Code utilization. LLM gateways excel for purposes making programmatic calls to LLMs, offering centralized monitoring, per consumer visibility, and unified management entry suppliers.
For conventional API entry eventualities, organizations can deploy gateways to achieve monitoring and attribution capabilities. The Claude Code steerage resolution already contains monitoring and attribution capabilities via Direct IdP authentication, OpenTelemetry metrics, IAM insurance policies, and CloudWatch dashboards. Including an LLM gateway to the steerage resolution duplicates present performance. Think about gateways just for multi-provider help, customized middleware, or request-level coverage enforcement past IAM.
Single account implementation
We advocate consolidating coding assistant inferences in a single devoted account, separate out of your improvement and manufacturing workloads. This strategy gives 5 key advantages:
- Simplified operations: Handle quotas and monitor utilization via unified dashboards as a substitute of monitoring throughout a number of accounts. Request quota will increase as soon as somewhat than per account.
- Clear price visibility: AWS Cost Explorer and Cost and Usage Reports present Claude Code fees straight with out advanced tagging. OpenTelemetry metrics allow division and team-level allocation.
- Centralized safety: CloudTrail logs movement to 1 location for monitoring and compliance. Deploy the monitoring stack as soon as to gather metrics from builders.
- Manufacturing safety: Account-level isolation helps forestall Claude Code utilization from exhausting quotas and throttling manufacturing purposes. Manufacturing visitors spikes don’t have an effect on developer productiveness.
- Implementation: Cross-account IAM configuration lets builders authenticate via id suppliers that federate to restricted roles, granting solely mannequin invocation permissions with acceptable guardrails.
This technique integrates with Direct IdP authentication and OpenTelemetry monitoring. Identification suppliers deal with authentication, the devoted account handles inference, and improvement accounts give attention to purposes.
Inference profiles
Amazon Bedrock inference profiles present price monitoring via useful resource tagging, however don’t scale to per-developer granularity. When you can create software profiles for price allocation, managing profiles for 1000+ particular person builders turns into operationally burdensome. Inference profiles work finest for organizations with 10-50 distinct groups requiring remoted price monitoring, or when utilizing cross-Area inference the place managed routing distributes requests throughout AWS Areas. They’re perfect for eventualities requiring primary price allocation somewhat than complete monitoring.
System-defined cross-Area inference profiles robotically route requests throughout a number of AWS Areas, distributing load for increased throughput and availability. Once you invoke a cross-Area profile (e.g., us.anthropic.claude-sonnet-4), Amazon Bedrock selects an obtainable Area to course of your request.
Software inference profiles are profiles you create explicitly in your account, usually wrapped round a system-defined profile or a selected mannequin in a Area. You possibly can tag software profiles with customized key-value pairs like staff:data-science or challenge:fraud-detection that movement to AWS Value and Utilization Reviews for price allocation evaluation. To create an software profile:
Tags seem in AWS Value and Utilization Reviews, so you possibly can question:
"What did the data-science staff spend on Amazon Bedrock final month?"
Every profile have to be referenced explicitly in API calls, which means builders’ credential configurations should specify their distinctive profile somewhat than a shared endpoint.
For extra on inference profiles, see Amazon Bedrock Inference Profiles documentation.
Monitoring
An efficient monitoring technique transforms Claude Code from a productiveness device right into a measurable funding by monitoring utilization, prices, and affect.
Progressive enhancement path
Monitoring layers are complementary. Organizations usually begin with primary visibility and add capabilities as ROI necessities justify extra infrastructure.

Let’s discover every stage and when it is smart on your deployment.
Observe: Infrastructure prices develop progressively—every stage retains the earlier layers whereas including new elements.
CloudWatch
Amazon Bedrock publishes metrics to Amazon CloudWatch robotically, monitoring invocation counts, throttling errors, and latency. CloudWatch graphs present combination traits akin to whole requests, common latency, and quota utilization with minimal deployment effort. This baseline monitoring is included in the usual pricing of CloudWatch and requires minimal deployment effort. You possibly can create CloudWatch alarms that notify you when invocation charges spike, error charges exceed thresholds, or latency degrades.
Invocation logging
Amazon Bedrock invocation logging captures detailed details about every API name to Amazon S3 or CloudWatch Logs, preserving particular person request data together with invocation metadata and full request/response information. Course of logs with Amazon Athena, load into information warehouses, or analyze with customized instruments. The logs show utilization patterns, invocations by mannequin, peak utilization, and an audit path of Amazon Bedrock entry.
OpenTelemetry
Claude Code contains help for OpenTelemetry, an open supply observability framework for gathering software telemetry information. When configured with an OpenTelemetry collector endpoint, Claude Code emits detailed metrics about its operations for each Amazon Bedrock API calls and higher-level improvement actions.
The telemetry captures detailed code-level metrics not included in Amazon Bedrock’s default logging, akin to: traces of code added/deleted, information modified, programming languages used, and builders’ acceptance charges of Claude’s strategies. It additionally tracks key operations together with file edits, code searches, documentation requests, and refactoring duties.
The guidance solution deploys OpenTelemetry infrastructure on Amazon ECS Fargate. An Software Load Balancer receives telemetry over HTTP(S) and forwards metrics to an OpenTelemetry Collector. The collector exports information to Amazon CloudWatch and Amazon S3.
Dashboard
The steerage resolution features a CloudWatch dashboard that shows key metrics repeatedly, monitoring energetic customers by hour, day, or week to disclose adoption and utilization traits that allow per-user price calculation. Token consumption breaks down by enter, output, and cached tokens, with excessive cache hit charges indicating environment friendly context reuse and per-user views figuring out heavy customers. Code exercise metrics monitor traces added and deleted, correlating with token utilization to indicate effectivity and utilization patterns.
The operations breakdown reveals distribution of file edits, code searches, and documentation requests, whereas consumer leaderboards show high shoppers by tokens, traces of code, or session length.
The dashboard updates in near-real-time and integrates with CloudWatch alarms to set off notifications when metrics exceed thresholds. The steerage resolution deploys via CloudFormation with customized Lambda features for advanced aggregations.
Analytics
Whereas dashboards excel at real-time monitoring, long-term traits and sophisticated consumer conduct evaluation require analytical instruments. The steerage resolution’s optionally available analytics stack streams metrics to Amazon S3 utilizing Amazon Data Firehose. AWS Glue Knowledge Catalog defines the schema, making information queryable via Amazon Athena.
The analytics layer helps queries akin to month-to-month token consumption by division, code acceptance charges by programming language, and token effectivity variations throughout groups. Value evaluation turns into subtle by becoming a member of token metrics with Amazon Bedrock pricing to calculate actual prices by consumer, then combination for department-level chargeback. Time-series evaluation reveals how prices scale with staff development for price range forecasting. The SQL interface integrates with enterprise intelligence instruments, enabling exports to spreadsheets, machine studying fashions, or challenge administration methods.
For instance, to see the month-to-month price evaluation by division:
The infrastructure provides average price: Knowledge Firehose fees for ingestion, S3 for retention, and Athena fees per question based mostly on information scanned.
Allow analytics whenever you want historic evaluation, advanced queries, or integration with enterprise intelligence instruments. Whereas the dashboard alone could suffice for small deployments or organizations targeted totally on real-time monitoring, enterprises making important investments in Claude Code ought to implement the analytics layer. This gives the visibility wanted to display return on funding and optimize utilization over time.
Quotas
Quotas enable organizations to manage and handle token consumption by setting utilization limits for particular person builders or groups. Earlier than implementing quotas, we advocate first enabling monitoring to know pure utilization patterns. Utilization information usually reveals that prime token consumption correlates with excessive productiveness, indicating that heavy customers ship proportional worth.
The quota system shops limits in DynamoDB with entries like:
A Lambda perform triggered by CloudWatch Occasions aggregates token consumption each quarter-hour, updating DynamoDB and publishing to SNS when thresholds are crossed.
Monitoring comparability
The next desk summarizes the trade-offs throughout monitoring approaches:
| Functionality | CloudWatch | Invocation logging | OpenTelemetry | Dashboard and Analytics |
| Arrange complexity | None | Low | Medium | Medium |
| Consumer attribution | None | IAM Identification | Full | Full |
| Actual-time metrics | Sure | No | Sure | Sure |
| Code-level metrics | No | No | Sure | Sure |
| Historic evaluation | Restricted | Sure | Sure | Sure |
| Value allocation | Account stage | Account stage | Consumer, staff, division | Consumer, staff, division |
| Token monitor | Mixture | Per-request | Per-user | Per-user with traits |
| Quota enforcement | Guide | Guide | Doable | Doable |
| Operational overhead | Minimal | Low | Medium | Medium |
| Value | Minimal | Low | Medium | Medium |
| Use case | POC | Fundamental auditing | Manufacturing | Enterprise with ROI |
Placing it collectively
This part synthesizes authentication strategies, organizational structure, and monitoring methods right into a really useful deployment sample, offering steerage on implementation priorities as your deployment matures. This structure balances safety, operational simplicity, and complete visibility. Builders authenticate as soon as per day with company credentials, directors see real-time utilization in dashboards, and safety groups have CloudTrail audit logs and complete user-attributed metrics via OpenTelemetry.
Implementation path
The steerage resolution helps fast deployment via an interactive setup course of, with authentication and monitoring working inside hours. Deploy the complete stack to a pilot group first, collect actual utilization information, then increase based mostly on validated patterns.
- Deployment – Clone the Steering for Claude Code with Amazon Bedrock repository and run the interactive poetry run
ccwb initwizard. The wizard configures your id supplier, federation kind, AWS Areas, and optionally available monitoring. Deploy the CloudFormation stacks (usually 15-Half-hour), construct distribution packages, and take a look at authentication domestically earlier than distributing to customers.
- Distribution – Determine a pilot group of 5-20 builders from completely different groups. This group will validate authentication, monitoring, and supply utilization information for full rollout planning. Should you enabled monitoring, the CloudWatch dashboard reveals exercise instantly. You possibly can monitor token consumption, code acceptance charges, and operation varieties to estimate capability necessities, determine coaching wants, and display worth for a broader rollout.
- Enlargement – As soon as Claude Code is validated, increase adoption by staff or division. Add the analytics stack (usually 1-2 hours) for historic development evaluation to see adoption charges, high-performing groups, and prices forecasts.
- Optimization – Use monitoring information for steady enchancment via common evaluation cycles with improvement management. The monitoring information can display worth, determine coaching wants, and information capability changes.
When to deviate from the really useful sample
Whereas the structure above fits most enterprise deployments, particular circumstances would possibly justify completely different approaches.
- Think about an LLM gateway should you want a number of LLM suppliers past Amazon Bedrock, customized middleware for immediate processing or response filtering, or function in a regulatory atmosphere requiring request-level coverage enforcement past the AWS IAM capabilities.
- Think about inference profiles when you have underneath 50 groups requiring separate price monitoring and like AWS-native billing allocation over telemetry metrics. Inference profiles work properly for project-based price allocation however don’t scale to per-developer monitoring.
- Think about beginning with out monitoring for time-limited pilots with underneath 10 builders the place primary CloudWatch metrics suffice. Plan so as to add monitoring earlier than scaling, as retrofitting requires redistributing packages to builders.
- Think about API keys just for time-boxed testing (underneath one week) the place safety dangers are acceptable.
Conclusion
Deploying Claude Code with Amazon Bedrock at enterprise scale requires considerate authentication, structure, and monitoring choices. Manufacturing-ready deployments comply with a transparent sample: Direct IdP integration gives safe, user-attributed entry and a devoted AWS account simplifies capability administration. OpenTelemetry monitoring gives visibility into prices and developer productiveness. The Guidance for Claude Code with Amazon Bedrock implements these patterns in a deployable resolution. Begin with authentication and primary monitoring, then progressively add options as you scale.
As AI-powered improvement instruments turn out to be the trade customary, organizations that prioritize safety, monitoring, and operational excellence of their deployments will achieve lasting benefits. This information gives a complete framework that will help you maximize Claude Code’s potential throughout your enterprise.
To get began, go to the Guidance for Claude Code with Amazon Bedrock repository.
In regards to the authors
Court docket Schuett is a Principal Specialist Resolution Architect – GenAI who spends his days working with AI Coding Assistants to assist others get probably the most out of them. Exterior of labor, Court docket enjoys touring, listening to music, and woodworking.
Jawhny Cooke is the International Tech Lead for Anthropic’s Claude Code at AWS, the place he makes a speciality of serving to enterprises operationalize agentic coding at scale. He companions with prospects and companions to resolve the advanced manufacturing challenges of AI-assisted improvement, from designing autonomous coding workflows and orchestrating multi-agent methods to operational optimization on AWS infrastructure. His work bridges cutting-edge AI capabilities with enterprise-grade reliability to assist organizations confidently undertake Claude Code in manufacturing environments.
Karan Lakhwani is a Sr. Buyer Options Supervisor at Amazon Internet Companies. He makes a speciality of generative AI applied sciences and is an AWS Golden Jacket recipient. Exterior of labor, Karan enjoys discovering new eating places and snowboarding.
Gabe Levy is an Affiliate Supply Guide at AWS based mostly out of New York primarily targeted on Software Improvement within the cloud. Gabe has a sub-specialization in Synthetic Intelligence and Machine Studying. When not working with AWS prospects, he enjoys exercising, studying and spending time with household and mates.
Gabriel Velazquez Lopez is a GenAI Product Chief at AWS, the place he leads the technique, go-to-market, and product launches for Claude on AWS in partnership with Anthropic.