Constructing a multi-agent voice assistant with Amazon Nova Sonic and Amazon Bedrock AgentCore


Amazon Nova Sonic is a basis mannequin that creates pure, human-like speech-to-speech conversations for generative AI purposes, permitting customers to work together with AI by way of voice in real-time, with capabilities for understanding tone, enabling pure stream, and performing actions.

Multi-agent structure provides a modular, strong, and scalable design sample for production-level voice assistants. This weblog put up explores Amazon Nova Sonic voice agent purposes and demonstrates how they combine with Strands Agents framework sub-agents whereas leveraging Amazon Bedrock AgentCore to create an efficient multi-agent system.

Why multi-agent structure?

Think about creating a monetary assistant software answerable for person onboarding, data assortment, identification verification, account inquiries, exception dealing with, and handing off to human brokers primarily based on predefined circumstances. As purposeful necessities broaden, the voice agent continues so as to add new inquiry sorts. The system immediate grows huge, and the underlying logic turns into more and more advanced, illustrates a persistent problem in software program improvement: monolithic designs result in methods which are tough to take care of and improve.

Consider multi-agent structure as constructing a group of specialised AI assistants relatively than counting on a single do-it-all helper. Identical to firms divide obligations throughout totally different departments, this method breaks advanced duties into smaller, manageable items. Every AI agent turns into an knowledgeable in a particular space—whether or not that’s fact-checking, knowledge processing, or dealing with specialised requests. For the person, the expertise feels seamless: there’s no delay, no change in voice, and no seen handoff. The system features behind the scenes, directing every knowledgeable agent to step in on the proper second.

Along with modular and strong advantages, multi-agent methods supply benefits much like a microservice structure, a well-liked enterprise software program design sample, offering scalability, distribution and maintainability whereas permitting organizations to reuse agentic workflows already developed for his or her giant language mannequin (LLM)-powered purposes.

Pattern software

On this weblog, we check with the Amazon Nova Sonic workshop multi-agent lab code, which makes use of the banking voice assistant as a pattern to display the right way to deploy specialised brokers on Amazon Bedrock AgentCore. It makes use of Nova Sonic as the voice interface layer and acts as an orchestrator to delegate detailed inquiries to sub-agents written in Strands Brokers hosted on AgentCore Runtime. You will discover the pattern supply code on the GitHub repo.

Within the banking voice agent pattern, the dialog stream begins with a greeting and amassing the person’s title, after which it handles inquiries associated to banking or mortgages. We use three secondary degree brokers hosted on AgentCore to deal with specialised logic:

  • Authenticate sub-agent: Handles person authentication utilizing the account ID and different data
  • Banking sub-agent: Handles account steadiness checks, statements, and different banking-related inquiries
  • Mortgage sub-agent: Handles mortgage-related inquiries, together with refinancing, charges, and reimbursement choices

sonic-multi-agent-diargam

Sub-agents are self-contained, dealing with their very own logic reminiscent of enter validation. For example, the authentication agent validates account IDs and returns errors to Nova Sonic if wanted. This simplifies the reasoning logic in Nova Sonic whereas maintaining enterprise logic encapsulated, much like the software program engineering modular design patterns.

Combine Nova Sonic with AgentCore by way of instrument use occasions

Amazon Nova Sonic depends on tool use to combine with agentic workflows. In the course of the Nova Sonic occasion lifecycle, you’ll be able to present instrument use configurations by way of the promptStart occasion, which is designed to provoke when Sonic receives particular kinds of enter.

For instance, within the following Sonic instrument configuration pattern, instrument use is configured to provoke occasions primarily based on Sonic’s built-in reasoning mannequin, which classifies the inquiry for routing to the banking sub-agents.

[
    {
        "toolSpec": {
            "name": "bankAgent",
            "description": `Use this tool whenever the customer asks about their **bank account balance** or **bank statement**.  
                    It should be triggered for queries such as:  
                    - "What’s my balance?"  
                    - "How much money do I have in my account?"  
                    - "Can I see my latest bank statement?"  
                    - "Show me my account summary."`,
            "inputSchema": {
                "json": JSON.stringify({
                "type": "object",
                "properties": {
                    "accountId": {
                        "type": "string",
                        "description": "This is a user input. It is the bank account Id which is a numeric number."
                    },
                    "query": {
                        "type": "string",
                        "description": "The inquiry to the bank agent such as check account balance, get statement etc."
                    }
                },
                "required": [
                    "accountId", "query"
                ]
                })
            }
        }
    }
]

When a person asks Nova Sonic a query reminiscent of ‘What’s my account steadiness?’, Sonic sends a toolUse occasion to the consumer software with the desired toolName (for instance, bankAgent) outlined within the configuration. The applying can then invoke the sub-agent hosted on AgentCore to deal with the banking logic and return the response to Sonic, which in flip generates an audio reply for the person.

{
  "occasion": {
    "toolUse": {
      "completionId": "UUID",
      "content material": "{"accountId":"one two three 4 5","question":"verify account steadiness"}",
      "contentId": "UUID",
      "promptName": "UUID",
      "position": "TOOL",
      "sessionId": "UUID",
      "toolName": "bankAgent",
      "toolUseId": "UUID"
    }
  }
}

Sub-agent on AgentCore

The next pattern showcases the banking sub-agent developed utilizing the Strands Brokers framework, particularly configured for deployment on Bedrock AgentCore. It leverages Nova Lite by way of Amazon Bedrock as its reasoning mannequin, offering efficient cognitive capabilities with minimal latency. The agent implementation incorporates a system immediate that defines its banking assistant obligations, complemented by two specialised instruments: one for account steadiness inquiries and one other for financial institution assertion retrieval.

from strands import Agent, instrument
import json
from bedrock_agentcore.runtime import BedrockAgentCoreApp
from strands.fashions import BedrockModel
import re, argparse

app = BedrockAgentCoreApp()

@instrument
def get_account_balance(account_id) -> str:
    """Get account steadiness for given account Id

    Args:
        account_id: Checking account Id
    """

    # The precise implementation will retrieve data from a database API or one other backend service.
    
    return {"outcome": outcome}

@instrument
def get_statement(account_id: str, year_and_month: str) -> str:
    """Get account assertion for a given 12 months and month
    Args:
        account_id: Checking account Id
        year_and_month: Yr and month of the financial institution assertion. For instance: 2025_08 or August 2025
    """
    # The precise implementation will retrieve data from a database API or one other backend service.
    
    return {"outcome": outcome}


# Specify Bedrock LLM for the Agent
bedrock_model = BedrockModel(
    model_id="amazon.nova-lite-v1:0",
)
# System immediate
system_prompt=""'
You're a banking agent. You'll obtain requests that embrace:  
- `account_id`  
- `question` (the inquiry sort, reminiscent of **steadiness** or **assertion**, plus any extra particulars like month).  

## Directions
1. Use the supplied `account_id` and `question` to name the instruments.  
2. The instrument will return a JSON response.  
3. Summarize the lead to 2–3 sentences.  
   - For a **steadiness inquiry**, give the account steadiness with forex and date.  
   - For a **assertion inquiry**, present opening steadiness, closing steadiness, and variety of transactions.  
4. Don't return uncooked JSON. All the time reply in pure language.  
'''

# Create an agent with instruments, LLM, and system immediate
agent = Agent(
    instruments=[ get_account_balance, get_statement], 
    mannequin=bedrock_model,
    system_prompt=system_prompt
)

@app.entrypoint
def banking_agent(payload):
    response = agent(json.dumps(payload))
    return response.message['content'][0]['text']
    
if __name__ == "__main__":
    app.run()

Greatest practices for voice-based multi-agent methods

Multi-agent structure supplies distinctive flexibility and a modular design method, permitting builders to construction voice assistants effectively and probably reuse current specialised agent workflows. When implementing voice-first experiences, there are necessary finest practices to contemplate that handle the distinctive challenges of this modality.

  • Stability flexibility and latency: Though the flexibility to invoke sub-agents utilizing Nova Sonic instrument use occasions creates highly effective capabilities, it could actually introduce extra latency to voice responses. For the use instances that require a synchronized expertise, every agent handoff represents a possible delay level within the interplay stream. Due to this fact, it’s necessary to design with response time in thoughts.
  • Optimize mannequin choice for sub-agents: Beginning with smaller, extra environment friendly fashions like Nova Lite for sub-agents can considerably scale back latency whereas nonetheless dealing with specialised duties successfully. Reserve bigger, extra succesful fashions for advanced reasoning or when subtle pure language understanding is crucial.
  • Craft voice-optimized responses: Voice assistants carry out finest with concise, centered responses that may be adopted by extra particulars when wanted. This method not solely improves latency but in addition creates a extra pure conversational stream that aligns with human expectations for verbal communication.

Take into account stateless vs. stateful sub-agent design

Stateless sub-agents deal with every request independently, with out retaining reminiscence of previous interactions or session-level states. They’re easy to implement, simple to scale, and work properly for easy, one-off duties. Nonetheless, they can’t present context-aware responses until exterior state administration is launched.

Stateful sub-agents, alternatively, keep reminiscence throughout interactions to assist context-aware responses and session-level states. This allows extra personalised and cohesive person experiences, however comes with added complexity and useful resource necessities. They’re finest fitted to eventualities involving multi-turn interactions and person or session-level context caching.

Conclusion

Multi-agent architectures unlock flexibility, scalability, and accuracy for advanced AI-driven workflows. By combining the Nova Sonic conversational capabilities with the orchestration energy of Bedrock AgentCore, you’ll be able to construct clever, specialised brokers that work collectively seamlessly. In case you’re exploring methods to boost your AI purposes, multi-agent patterns with Nova Sonic and AgentCore are a strong method value testing.

Study extra about Amazon Nova Sonic by visiting the User Guide, constructing your software with the sample applications, and exploring the Nova Sonic workshop to get began. It’s also possible to check with the technical report and model card for added benchmarks.


Concerning the authors

Author - Lana Zhang Lana Zhang is a Senior Specialist Options Architect for Generative AI at AWS throughout the Worldwide Specialist Group. She focuses on AI/ML, with a give attention to use instances reminiscent of AI voice assistants and multimodal understanding. She works intently with prospects throughout various industries, together with media and leisure, gaming, sports activities, promoting, monetary companies, and healthcare, to assist them rework their enterprise options by way of AI.

Leave a Reply

Your email address will not be published. Required fields are marked *