Amazon Bedrock AgentCore Observability with Langfuse

The rise of synthetic intelligence (AI) brokers marks a change in software program growth and the way purposes make selections and work together with customers. Whereas conventional methods comply with predictable paths, AI brokers have interaction in complicated reasoning that is still hidden from view. This invisibility creates a problem for organizations: how can they belief what they’ll’t see? That is the place agent observability enters the image, providing deep insights into how agentic purposes carry out, work together, and execute duties.

On this submit, we clarify the way to combine Langfuse observability with Amazon Bedrock AgentCore to realize deep visibility into an AI agent’s efficiency, debug points quicker, and optimize prices. We stroll via an entire implementation utilizing Strands brokers deployed on AgentCore Runtime adopted by step-by-step code examples.

Amazon Bedrock AgentCore is a complete agentic platform that may deploy and function extremely succesful AI brokers securely, at scale. It presents purpose-built infrastructure for dynamic agent workloads, highly effective instruments to boost brokers, and important controls for real-world deployment. AgentCore is comprised of absolutely managed providers that can be utilized collectively or independently. These providers work with any framework together with CrewAI, LangGraph, LlamaIndex, and Strands Brokers, and any basis mannequin in or outdoors of Amazon Bedrock, providing flexibility and reliability. AgentCore emits telemetry information in standardized OpenTelemetry (OTEL)-compatible format, enabling simple integration with an present monitoring and observability stack. It presents detailed visualizations of every step within the agent workflow, enabling inspection of an agent’s execution path, audit intermediate outputs, and debugging efficiency bottlenecks and failures.

How Langfuse tracing works

Langfuse makes use of OpenTelemetry to hint and monitor brokers deployed on Amazon Bedrock AgentCore. OpenTelemetry is a Cloud Native Computing Basis (CNCF) venture that gives a set of specs, APIs, and libraries that outline a regular method to accumulate distributed traces and metrics from an software. Customers can now observe efficiency metrics together with token utilization, latency, and execution durations throughout totally different processing phases. The system creates hierarchical hint buildings that seize each streaming and non-streaming responses, with detailed operation attributes and error states.

By the /api/public/otel endpoint, Langfuse capabilities as an OpenTelemetry Backend, mapping traces to its information mannequin utilizing generative AI conventions. That is notably priceless for complicated massive language mannequin (LLM) purposes using chains and brokers with instruments, the place nested traces assist builders rapidly determine and resolve points. The combination helps systematic debugging, efficiency monitoring, and audit path upkeep, making it simpler for groups to construct and keep dependable AI purposes on Amazon Bedrock AgentCore.

Along with Agent observability, Langfuse presents a set of built-in instruments protecting the total LLM software growth lifecycle. This contains working automated llm-as-a-judge evaluators (on-line/offline), organizing information labeling for root trigger evaluation and evaluator alignment, observe experiments (native and in CI), iterate in prompts interactively in a playground, and model management them in UI utilizing immediate administration.

Answer overview

This submit exhibits the way to deploy a Strands agent on Amazon Bedrock AgentCore Runtime with Langfuse observability. The implementation makes use of Anthropic Claude fashions via Amazon Bedrock. Telemetry information flows from the Strands agent via OTEL exporters to Langfuse for monitoring and debugging. To make use of Langfuse, set disable_otel=True within the AgentCore runtime deployment. This turns off AgentCore’s default observability.

Determine 1: Structure overview

Key elements used within the answer are:

Strands Brokers: Python framework for constructing LLM-powered brokers with built-in telemetry help
Amazon Bedrock AgentCore Runtime: Managed runtime service for internet hosting and scaling brokers on Amazon Net Providers (AWS)
Langfuse: Open-source observability and analysis platform for LLM purposes that receives traces through OTEL
OpenTelemetry: Business-standard protocol for amassing and exporting telemetry information

Technical implementation information

Now that now we have lined how Langfuse tracing works, we will stroll via the way to implement it with Amazon Bedrock AgentCore.

Stipulations

An AWS account
- Earlier than utilizing Amazon Bedrock, verify all AWS credentials are configured appropriately. They are often arrange utilizing the AWS CLI or by setting setting variables. For this pocket book we assume that the credentials are already configured.
Amazon Bedrock Mannequin Entry for Anthropic Claude 3.7 in us-west-2 area
Amazon Bedrock AgentCore permissions
Python 3.10+
Docker put in domestically
A Langfuse account, which is required to create a Langfuse API Key.
- Customers must register at Langfuse cloud, create a venture, and get API keys
- Alternatively, you’ll be able to self-host Langfuse inside your personal AWS account utilizing the Terraform module.

Walkthrough

The next steps stroll via the way to use Langfuse for amassing traces from brokers created utilizing Strands SDK in AgentCore runtime. Customers also can check with this pocket book on Github to get began with it straight away.

Clone this Github repo:

git clone https://github.com/awslabs/amazon-bedrock-agentcore-samples.git

As soon as the repo is cloned, go to the Amazon Bedrock AgentCore Samples listing, discover the pocket book runtime_with_strands_and_langfuse.ipynb and begin working every cell.

Step 1: Python dependencies and necessities packages for our Strands agent

Execute the beneath cell to put in the dependencies that are outlined in the requirements.txt file.

!pip set up --force-reinstall -U -r necessities.txt –quiet

Step 2: Agent implementation

The agent file (strands_claude.py) implements a journey agent with net search capabilities.

%%writefile strands_claude.py
import os
import logging
from bedrock_agentcore.runtime import BedrockAgentCoreApp
from strands import Agent, software
from strands.fashions import BedrockModel
from strands.telemetry import StrandsTelemetry
from ddgs import DDGS
logging.basicConfig(degree=logging.ERROR, format="[%(levelname)s] %(message)s")
logger = logging.getLogger(__name__)
logger.setLevel(os.getenv("AGENT_RUNTIME_LOG_LEVEL", "INFO").higher())
@software
def web_search(question: str) -> str:
"""
Search the online for data utilizing DuckDuckGo.
Args:
question: The search question
Returns:
A string containing the search outcomes
"""
attempt:
ddgs = DDGS()
outcomes = ddgs.textual content(question, max_results=5)
formatted_results = []
for i, lead to enumerate(outcomes, 1):
formatted_results.append(
f"{i}. {consequence.get('title', 'No title')}n"
f" {consequence.get('physique', 'No abstract')}n"
f" Supply: {consequence.get('href', 'No URL')}n"
)
return "n".be a part of(formatted_results) if formatted_results else "No outcomes discovered."
besides Exception as e:
return f"Error looking the online: {str(e)}"
# Perform to initialize Bedrock mannequin
def get_bedrock_model():
area = os.getenv("AWS_DEFAULT_REGION", "us-west-2")
model_id = os.getenv("BEDROCK_MODEL_ID", "us.anthropic.claude-3-7-sonnet-20250219-v1:0")
bedrock_model = BedrockModel(
model_id=model_id,
region_name=area,
temperature=0.0,
max_tokens=1024
)
return bedrock_model
# Initialize the Bedrock mannequin
bedrock_model = get_bedrock_model()
# Outline the agent's system immediate
system_prompt = """You're an skilled journey agent specializing in personalised journey suggestions
with entry to real-time net data. Your position is to search out dream locations matching person preferences
utilizing net seek for present data. You must present complete suggestions with present
data, temporary descriptions, and sensible journey particulars."""
app = BedrockAgentCoreApp()
def initialize_agent():
"""Initialize the agent with correct telemetry configuration."""
# Initialize Strands telemetry with 3P configuration
strands_telemetry = StrandsTelemetry()
strands_telemetry.setup_otlp_exporter()

# Create and cache the agent
agent = Agent(
mannequin=bedrock_model,
system_prompt=system_prompt,
instruments=[web_search]
)

return agent
@app.entrypoint
def strands_agent_bedrock(payload, context=None):
"""
Invoke the agent with a payload
"""
user_input = payload.get("immediate")
logger.information("[%s] Consumer enter: %s", context.session_id, user_input)

# Initialize agent with correct configuration
agent = initialize_agent()

response = agent(user_input)
return response.message['content'][0]['text']
if __name__ == "__main__":
app.run()

Step 3: Configure AgentCore Runtime deployment

Subsequent, use our starter toolkit to configure the AgentCore Runtime deployment with an entry level, the execution position we created, and a necessities file. Moreover, configure the starter equipment to auto create the Amazon Elastic Container Registry (ECR) repository on launch.

Throughout the configure step, the docker file is generated primarily based on the applying code. When utilizing the bedrock_agentcore_starter_toolkit to configure the agent, it configures AgentCore Observability by default. Due to this fact, to make use of Langfuse, customers ought to disable OTEL by setting the configuration flag as “True” as proven within the following code block.

Determine 2: Configure AgentCore Runtime

from bedrock_agentcore_starter_toolkit import Runtime
from boto3.session import Session
boto_session = Session()
area = boto_session.region_name
agentcore_runtime = Runtime()
agent_name = "strands_langfuse_observability"
response = agentcore_runtime.configure(
entrypoint="strands_claude.py",
auto_create_execution_role=True,
auto_create_ecr=True,
requirements_file="necessities.txt",
area=area,
agent_name=agent_name,
disable_otel=True,
)
response

Step 4: Deploy to AgentCore Runtime

Now {that a} docker file has been generated, launch the agent to the AgentCore Runtime to create the Amazon ECR repository and the AgentCore Runtime.

Now configure the Langfuse secret key, public key and OTEL endpoints in AWS Programs Supervisor Parameter Retailer, which gives safe, hierarchical storage for configuration information administration and secrets and techniques administration.

import base64
# Langfuse configuration
otel_endpoint = "https://us.cloud.langfuse.com/api/public/otel"
langfuse_secret_key = "<Enter your Langfuse secret key>" #For manufacturing key needs to be securely saved
langfuse_public_key = "<Enter your Langfuse public key" #For manufacturing key needs to be securely saved
langfuse_auth_token = base64.b64encode(f"{langfuse_public_key}:{langfuse_secret_key}".encode()).decode()
otel_auth_header = f"Authorization=Primary {langfuse_auth_token}"
launch_result = agentcore_runtime.launch(
env_vars={
"BEDROCK_MODEL_ID": "us.anthropic.claude-3-7-sonnet-20250219-v1:0", # Instance mannequin ID
"OTEL_EXPORTER_OTLP_ENDPOINT": otel_endpoint, # Use Langfuse OTEL endpoint
"OTEL_EXPORTER_OTLP_HEADERS": otel_auth_header, # Add Langfuse OTEL auth header
"DISABLE_ADOT_OBSERVABILITY": "true",
}
)
launch_result

The next desk describes the assorted configuration parameters getting used.

Parameter	Description	Default
`langfuse_public_key`	API key for OTEL endpoint	Surroundings variable
`langfuse_secret_key`	Secret key for OTEL endpoint	Surroundings variable
`OTEL_EXPORTER_OTLP_ENDPOINT`	Hint endpoint	https://cloud.langfuse.com/api/public/otel/v1/traces
`OTEL_EXPORTER_OTLP_HEADERS`	Authentication sort	Primary
`DISABLE_ADOT_OBSERVABILITY`	AWS Distro for Open Telemetry (ADOT). The implementation disables Agent Core’s default observability to make use of Langfuse as a substitute.	True
`BEDROCK_MODEL_ID`	AWS Bedrock Mannequin ID	us. anthropic.claude-3-7-sonnet-20250219-v1:0

Step 5: Examine deployment standing

Look forward to the runtime to be prepared earlier than invoking:

import time
status_response = agentcore_runtime.standing()
standing = status_response.endpoint['status']
end_status = ['READY', 'CREATE_FAILED', 'DELETE_FAILED', 'UPDATE_FAILED']
whereas standing not in end_status:
time.sleep(10)
status_response = agentcore_runtime.standing()
standing = status_response.endpoint['status']
print(standing)
standing

A profitable deployment exhibits a “Prepared” state for the agent runtime.

Step 6: Invoking AgentCore Runtime

Lastly, invoke our AgentCore Runtime with a payload.

invoke_response = agentcore_runtime.invoke({"immediate": "I am planning a weekend journey to london. What are the must-visit locations and native meals I ought to attempt?"})

As soon as the AgentCore Runtime has been invoked, customers ought to be capable of see the Langfuse traces within the Langfuse dashboard.

Step 7: View traces in Langfuse

After working the agent, go to the Langfuse venture to view the detailed traces. The traces embody:

Agent invocation particulars
Software calls (net search)
Mannequin interactions with latency and token utilization
Request/response payloads

Traces and hierarchy

Langfuse captures all interactions from person requests to particular person mannequin calls. Every hint captures the entire execution path, together with API calls, operate invocations, and mannequin responses, making a complete timeline of agent actions. The nested construction of traces permits builders to drill down into particular interactions and determine efficiency bottlenecks or error patterns at any degree of the execution chain. To additional improve observability capabilities, Langfuse gives tagging mechanisms that may be carried out in agent workflows.

Determine 3: Traces in Langfuse

Combining hierarchical traces with strategic tagging gives insights into agent operations, enabling data-driven optimization and superior person experiences. As proven within the following picture, builders can drill down into the exact timing of every operation inside their agent’s execution stream. Within the earlier instance, the entire request took 26.57s, with particular person breakdowns for occasion loop cycle, software calls, and different elements. Use this timing data to search out efficiency bottlenecks and scale back response occasions. For example, sure LLM operations would possibly take longer than anticipated, or there could also be alternatives to parallelize particular actions to scale back general latency. By leveraging these insights, customers could make data-driven selections to boost agent’s efficiency and ship a greater buyer expertise.

Determine 4: Detailed hint hierarchy

Langfuse dashboard

The Langfuse dashboard options three totally different dashboards for monitoring similar to Value, Latency and Utilization Administration.

Determine 5: Langfuse dashboard

Value monitoring

Value monitoring helps observe bills at each the combination and particular person request ranges to take care of management over AI infrastructure bills. The platform gives detailed price breakdowns per mannequin, person, and performance name, enabling groups to determine cost-intensive operations and optimize their implementation. This granular price visibility helps in making data-driven selections about mannequin choice, immediate engineering, and useful resource allocation whereas sustaining finances constraints. Dashboard price information is supplied for estimation functions; precise expenses needs to be verified via official billing statements.

Determine 6: Value dashboard

Langfuse latency dashboard

Latency metrics might be monitored throughout traces and generations for efficiency optimization. The dashboard exhibits the next metrics by default and you may create customized charts and dashboard relying in your wants:

P 95 Latency by Stage (Observations)
P 95 Latency by Use Case
Max Latency by Consumer Id (Traces)
Avg Time To First Token by Immediate Title (Observations)
P 95 Time To First Token by Mannequin
P 95 Latency by Mannequin
Avg Output Tokens Per Second by Mannequin

Determine 7: Latency dashboard

Langfuse utilization administration

This dashboard exhibits metrics throughout traces, observations, and scores to handle useful resource allocation.

Determine 8: Utilization administration dashboard

Conclusion

This submit demonstrated the way to combine Langfuse with AgentCore for complete observability of AI brokers. Customers can now observe efficiency, debug interactions, and optimize prices throughout workflows. We anticipate extra Langfuse observability options and integration choices sooner or later to assist scale AI purposes.

Begin implementing Langfuse with AgentCore at the moment to realize deeper insights into brokers’ efficiency, observe dialog flows, and optimize AI purposes. For extra data, go to the next assets:

Concerning the authors

Richa Gupta is a Senior Options Architect at Amazon Net Providers, specializing in AI/ML, Generative AI, and Agentic AI. She is enthusiastic about serving to prospects on their AI transformation journey, architecting end-to-end options from proof-of-concept to manufacturing deployment and drive enterprise income. Past her skilled pursuits, Richa likes to make latte arts and is an journey fanatic.

Ishan Singh is a Sr. Generative AI Information Scientist at Amazon Net Providers, the place he companions with prospects to architect progressive and accountable generative AI options. With deep experience in AI and machine studying, Ishan leads the event of manufacturing Generative AI options at scale, with a concentrate on evaluations and observability. Outdoors of labor, he enjoys taking part in volleyball, exploring native bike trails, and spending time along with his spouse, child, and canine, Beau.

Yanyan Zhang is a Senior Generative AI Information Scientist at Amazon Net Providers, the place she has been engaged on cutting-edge AI/ML applied sciences as a Generative AI Specialist, serving to prospects use generative AI to attain their desired outcomes. Yanyan graduated from Texas A&M College with a PhD in Electrical Engineering. Outdoors of labor, she loves touring, figuring out, and exploring new issues.

Madhu Samhitha is a Specialist Answer Architect at Amazon Net Providers, targeted on serving to prospects implement generative AI options. She combines her data of huge language fashions with strategic innovation to ship enterprise worth. She has a Grasp’s in Laptop Science from the College of Massachusetts, Amherst and has labored in varied industries. Past her technical position, Madhu is a educated classical dancer, an artwork fanatic, and enjoys exploring nationwide parks.

Marc Klingen is the co-founder and CEO of Langfuse, the Open Supply LLM Engineering Platform. After constructing LLM Brokers in 2023 collectively along with his co-founders, Marc and workforce realized that new tooling is important to carry brokers into manufacturing and scale them reliably. With Langfuse they’ve constructed the main Open Supply LLM Engineering Platform (Observability, Analysis, Immediate Administration) with over 18,000 GitHub stars, 14.8M+ SDK installs per 30 days, and 6M+ Docker pulls. Langfuse is utilized by prime engineering groups similar to Khan Academy, Samsara, Twilio, and Merck.