Amazon Bedrock AgentCore Observability with Langfuse
The rise of synthetic intelligence (AI) brokers marks a change in software program growth and the way purposes make selections and work together with customers. Whereas conventional methods comply with predictable paths, AI brokers have interaction in complicated reasoning that is still hidden from view. This invisibility creates a problem for organizations: how can they belief what they’ll’t see? That is the place agent observability enters the image, providing deep insights into how agentic purposes carry out, work together, and execute duties.
On this submit, we clarify the way to combine Langfuse observability with Amazon Bedrock AgentCore to realize deep visibility into an AI agent’s efficiency, debug points quicker, and optimize prices. We stroll via an entire implementation utilizing Strands brokers deployed on AgentCore Runtime adopted by step-by-step code examples.
Amazon Bedrock AgentCore is a complete agentic platform that may deploy and function extremely succesful AI brokers securely, at scale. It presents purpose-built infrastructure for dynamic agent workloads, highly effective instruments to boost brokers, and important controls for real-world deployment. AgentCore is comprised of absolutely managed providers that can be utilized collectively or independently. These providers work with any framework together with CrewAI, LangGraph, LlamaIndex, and Strands Brokers, and any basis mannequin in or outdoors of Amazon Bedrock, providing flexibility and reliability. AgentCore emits telemetry information in standardized OpenTelemetry (OTEL)-compatible format, enabling simple integration with an present monitoring and observability stack. It presents detailed visualizations of every step within the agent workflow, enabling inspection of an agent’s execution path, audit intermediate outputs, and debugging efficiency bottlenecks and failures.
How Langfuse tracing works
Langfuse makes use of OpenTelemetry to hint and monitor brokers deployed on Amazon Bedrock AgentCore. OpenTelemetry is a Cloud Native Computing Basis (CNCF) venture that gives a set of specs, APIs, and libraries that outline a regular method to accumulate distributed traces and metrics from an software. Customers can now observe efficiency metrics together with token utilization, latency, and execution durations throughout totally different processing phases. The system creates hierarchical hint buildings that seize each streaming and non-streaming responses, with detailed operation attributes and error states.
By the /api/public/otel endpoint, Langfuse capabilities as an OpenTelemetry Backend, mapping traces to its information mannequin utilizing generative AI conventions. That is notably priceless for complicated massive language mannequin (LLM) purposes using chains and brokers with instruments, the place nested traces assist builders rapidly determine and resolve points. The combination helps systematic debugging, efficiency monitoring, and audit path upkeep, making it simpler for groups to construct and keep dependable AI purposes on Amazon Bedrock AgentCore.
Along with Agent observability, Langfuse presents a set of built-in instruments protecting the total LLM software growth lifecycle. This contains working automated llm-as-a-judge evaluators (on-line/offline), organizing information labeling for root trigger evaluation and evaluator alignment, observe experiments (native and in CI), iterate in prompts interactively in a playground, and model management them in UI utilizing immediate administration.
Answer overview
This submit exhibits the way to deploy a Strands agent on Amazon Bedrock AgentCore Runtime with Langfuse observability. The implementation makes use of Anthropic Claude fashions via Amazon Bedrock. Telemetry information flows from the Strands agent via OTEL exporters to Langfuse for monitoring and debugging. To make use of Langfuse, set disable_otel=True within the AgentCore runtime deployment. This turns off AgentCore’s default observability.
Determine 1: Structure overview
Key elements used within the answer are:
- Strands Brokers: Python framework for constructing LLM-powered brokers with built-in telemetry help
- Amazon Bedrock AgentCore Runtime: Managed runtime service for internet hosting and scaling brokers on Amazon Net Providers (AWS)
- Langfuse: Open-source observability and analysis platform for LLM purposes that receives traces through OTEL
- OpenTelemetry: Business-standard protocol for amassing and exporting telemetry information
Technical implementation information
Now that now we have lined how Langfuse tracing works, we will stroll via the way to implement it with Amazon Bedrock AgentCore.
Stipulations
- An AWS account
- Earlier than utilizing Amazon Bedrock, verify all AWS credentials are configured appropriately. They are often arrange utilizing the AWS CLI or by setting setting variables. For this pocket book we assume that the credentials are already configured.
- Amazon Bedrock Mannequin Entry for Anthropic Claude 3.7 in us-west-2 area
- Amazon Bedrock AgentCore permissions
- Python 3.10+
- Docker put in domestically
- A Langfuse account, which is required to create a Langfuse API Key.
- Customers must register at Langfuse cloud, create a venture, and get API keys
- Alternatively, you’ll be able to self-host Langfuse inside your personal AWS account utilizing the Terraform module.
Walkthrough
The next steps stroll via the way to use Langfuse for amassing traces from brokers created utilizing Strands SDK in AgentCore runtime. Customers also can check with this pocket book on Github to get began with it straight away.
Clone this Github repo:
As soon as the repo is cloned, go to the Amazon Bedrock AgentCore Samples listing, discover the pocket book runtime_with_strands_and_langfuse.ipynb and begin working every cell.
Step 1: Python dependencies and necessities packages for our Strands agent
Execute the beneath cell to put in the dependencies that are outlined in the requirements.txt file.
Step 2: Agent implementation
The agent file (strands_claude.py) implements a journey agent with net search capabilities.
Step 3: Configure AgentCore Runtime deployment
Subsequent, use our starter toolkit to configure the AgentCore Runtime deployment with an entry level, the execution position we created, and a necessities file. Moreover, configure the starter equipment to auto create the Amazon Elastic Container Registry (ECR) repository on launch.
Throughout the configure step, the docker file is generated primarily based on the applying code. When utilizing the bedrock_agentcore_starter_toolkit to configure the agent, it configures AgentCore Observability by default. Due to this fact, to make use of Langfuse, customers ought to disable OTEL by setting the configuration flag as “True” as proven within the following code block.
Determine 2: Configure AgentCore Runtime
Step 4: Deploy to AgentCore Runtime
Now {that a} docker file has been generated, launch the agent to the AgentCore Runtime to create the Amazon ECR repository and the AgentCore Runtime.
Now configure the Langfuse secret key, public key and OTEL endpoints in AWS Programs Supervisor Parameter Retailer, which gives safe, hierarchical storage for configuration information administration and secrets and techniques administration.
The next desk describes the assorted configuration parameters getting used.
| Parameter | Description | Default |
|---|---|---|
langfuse_public_key |
API key for OTEL endpoint | Surroundings variable |
langfuse_secret_key |
Secret key for OTEL endpoint | Surroundings variable |
OTEL_EXPORTER_OTLP_ENDPOINT |
Hint endpoint | https://cloud.langfuse.com/api/public/otel/v1/traces |
OTEL_EXPORTER_OTLP_HEADERS |
Authentication sort | Primary |
DISABLE_ADOT_OBSERVABILITY |
AWS Distro for Open Telemetry (ADOT). The implementation disables Agent Core’s default observability to make use of Langfuse as a substitute. | True |
BEDROCK_MODEL_ID |
AWS Bedrock Mannequin ID | us. anthropic.claude-3-7-sonnet-20250219-v1:0 |
Step 5: Examine deployment standing
Look forward to the runtime to be prepared earlier than invoking:
A profitable deployment exhibits a “Prepared” state for the agent runtime.
Step 6: Invoking AgentCore Runtime
Lastly, invoke our AgentCore Runtime with a payload.
As soon as the AgentCore Runtime has been invoked, customers ought to be capable of see the Langfuse traces within the Langfuse dashboard.
Step 7: View traces in Langfuse
After working the agent, go to the Langfuse venture to view the detailed traces. The traces embody:
- Agent invocation particulars
- Software calls (net search)
- Mannequin interactions with latency and token utilization
- Request/response payloads
Traces and hierarchy
Langfuse captures all interactions from person requests to particular person mannequin calls. Every hint captures the entire execution path, together with API calls, operate invocations, and mannequin responses, making a complete timeline of agent actions. The nested construction of traces permits builders to drill down into particular interactions and determine efficiency bottlenecks or error patterns at any degree of the execution chain. To additional improve observability capabilities, Langfuse gives tagging mechanisms that may be carried out in agent workflows.
Determine 3: Traces in Langfuse
Combining hierarchical traces with strategic tagging gives insights into agent operations, enabling data-driven optimization and superior person experiences. As proven within the following picture, builders can drill down into the exact timing of every operation inside their agent’s execution stream. Within the earlier instance, the entire request took 26.57s, with particular person breakdowns for occasion loop cycle, software calls, and different elements. Use this timing data to search out efficiency bottlenecks and scale back response occasions. For example, sure LLM operations would possibly take longer than anticipated, or there could also be alternatives to parallelize particular actions to scale back general latency. By leveraging these insights, customers could make data-driven selections to boost agent’s efficiency and ship a greater buyer expertise.
Determine 4: Detailed hint hierarchy
Langfuse dashboard
The Langfuse dashboard options three totally different dashboards for monitoring similar to Value, Latency and Utilization Administration.
Determine 5: Langfuse dashboard
Value monitoring
Value monitoring helps observe bills at each the combination and particular person request ranges to take care of management over AI infrastructure bills. The platform gives detailed price breakdowns per mannequin, person, and performance name, enabling groups to determine cost-intensive operations and optimize their implementation. This granular price visibility helps in making data-driven selections about mannequin choice, immediate engineering, and useful resource allocation whereas sustaining finances constraints. Dashboard price information is supplied for estimation functions; precise expenses needs to be verified via official billing statements.
Determine 6: Value dashboard
Langfuse latency dashboard
Latency metrics might be monitored throughout traces and generations for efficiency optimization. The dashboard exhibits the next metrics by default and you may create customized charts and dashboard relying in your wants:
- P 95 Latency by Stage (Observations)
- P 95 Latency by Use Case
- Max Latency by Consumer Id (Traces)
- Avg Time To First Token by Immediate Title (Observations)
- P 95 Time To First Token by Mannequin
- P 95 Latency by Mannequin
- Avg Output Tokens Per Second by Mannequin
Determine 7: Latency dashboard
Langfuse utilization administration
This dashboard exhibits metrics throughout traces, observations, and scores to handle useful resource allocation.
Determine 8: Utilization administration dashboard
Conclusion
This submit demonstrated the way to combine Langfuse with AgentCore for complete observability of AI brokers. Customers can now observe efficiency, debug interactions, and optimize prices throughout workflows. We anticipate extra Langfuse observability options and integration choices sooner or later to assist scale AI purposes.
Begin implementing Langfuse with AgentCore at the moment to realize deeper insights into brokers’ efficiency, observe dialog flows, and optimize AI purposes. For extra data, go to the next assets:
Concerning the authors
Richa Gupta is a Senior Options Architect at Amazon Net Providers, specializing in AI/ML, Generative AI, and Agentic AI. She is enthusiastic about serving to prospects on their AI transformation journey, architecting end-to-end options from proof-of-concept to manufacturing deployment and drive enterprise income. Past her skilled pursuits, Richa likes to make latte arts and is an journey fanatic.
Ishan Singh is a Sr. Generative AI Information Scientist at Amazon Net Providers, the place he companions with prospects to architect progressive and accountable generative AI options. With deep experience in AI and machine studying, Ishan leads the event of manufacturing Generative AI options at scale, with a concentrate on evaluations and observability. Outdoors of labor, he enjoys taking part in volleyball, exploring native bike trails, and spending time along with his spouse, child, and canine, Beau.
Yanyan Zhang is a Senior Generative AI Information Scientist at Amazon Net Providers, the place she has been engaged on cutting-edge AI/ML applied sciences as a Generative AI Specialist, serving to prospects use generative AI to attain their desired outcomes. Yanyan graduated from Texas A&M College with a PhD in Electrical Engineering. Outdoors of labor, she loves touring, figuring out, and exploring new issues.
Madhu Samhitha is a Specialist Answer Architect at Amazon Net Providers, targeted on serving to prospects implement generative AI options. She combines her data of huge language fashions with strategic innovation to ship enterprise worth. She has a Grasp’s in Laptop Science from the College of Massachusetts, Amherst and has labored in varied industries. Past her technical position, Madhu is a educated classical dancer, an artwork fanatic, and enjoys exploring nationwide parks.
Marc Klingen is the co-founder and CEO of Langfuse, the Open Supply LLM Engineering Platform. After constructing LLM Brokers in 2023 collectively along with his co-founders, Marc and workforce realized that new tooling is important to carry brokers into manufacturing and scale them reliably. With Langfuse they’ve constructed the main Open Supply LLM Engineering Platform (Observability, Analysis, Immediate Administration) with over 18,000 GitHub stars, 14.8M+ SDK installs per 30 days, and 6M+ Docker pulls. Langfuse is utilized by prime engineering groups similar to Khan Academy, Samsara, Twilio, and Merck.