Construct a serverless audio summarization resolution with Amazon Bedrock and Whisper

Recordings of enterprise conferences, interviews, and buyer interactions have turn out to be important for preserving vital info. Nonetheless, transcribing and summarizing these recordings manually is commonly time-consuming and labor-intensive. With the progress in generative AI and automated speech recognition (ASR), automated options have emerged to make this course of quicker and extra environment friendly.
Defending personally identifiable info (PII) is a crucial facet of knowledge safety, pushed by each moral tasks and authorized necessities. On this publish, we display use the Open AI Whisper basis mannequin (FM) Whisper Giant V3 Turbo, accessible in Amazon Bedrock Marketplace, which provides entry to over 140 fashions by way of a devoted providing, to supply close to real-time transcription. These transcriptions are then processed by Amazon Bedrock for summarization and redaction of delicate info.
Amazon Bedrock is a completely managed service that gives a alternative of high-performing FMs from main AI firms like AI21 Labs, Anthropic, Cohere, DeepSeek, Luma, Meta, Mistral AI, poolside (coming quickly), Stability AI, and Amazon Nova by way of a single API, together with a broad set of capabilities to construct generative AI purposes with safety, privateness, and accountable AI. Moreover, you should use Amazon Bedrock Guardrails to routinely redact sensitive information, together with PII, from the transcription summaries to help compliance and information safety wants.
On this publish, we stroll by way of an end-to-end structure that mixes a React-based frontend with Amazon Bedrock, AWS Lambda, and AWS Step Functions to orchestrate the workflow, facilitating seamless integration and processing.
Answer overview
The answer highlights the ability of integrating serverless applied sciences with generative AI to automate and scale content material processing workflows. The consumer journey begins with importing a recording by way of a React frontend utility, hosted on Amazon CloudFront and backed by Amazon Simple Storage Service (Amazon S3) and Amazon API Gateway. When the file is uploaded, it triggers a Step Features state machine that orchestrates the core processing steps, utilizing AI fashions and Lambda capabilities for seamless information stream and transformation. The next diagram illustrates the answer structure.
The workflow consists of the next steps:
- The React utility is hosted in an S3 bucket and served to customers by way of CloudFront for quick, world entry. API Gateway handles interactions between the frontend and backend providers.
- Customers add audio or video recordsdata instantly from the app. These recordings are saved in a delegated S3 bucket for processing.
- An Amazon EventBridge rule detects the S3 add occasion and triggers the Step Features state machine, initiating the AI-powered processing pipeline.
- The state machine performs audio transcription, summarization, and redaction by orchestrating a number of Amazon Bedrock fashions in sequence. It makes use of Whisper for transcription, Claude for summarization, and Guardrails to redact delicate information.
- The redacted abstract is returned to the frontend utility and exhibited to the consumer.
The next diagram illustrates the state machine workflow.
The Step Features state machine orchestrates a collection of duties to transcribe, summarize, and redact delicate info from uploaded audio/video recordings:
- A Lambda operate is triggered to assemble enter particulars (for instance, Amazon S3 object path, metadata) and put together the payload for transcription.
- The payload is shipped to the OpenAI Whisper Giant V3 Turbo mannequin by way of the Amazon Bedrock Market to generate a close to real-time transcription of the recording.
- The uncooked transcript is handed to Anthropic’s Claude Sonnet 3.5 by way of Amazon Bedrock, which produces a concise and coherent abstract of the dialog or content material.
- A second Lambda operate validates and forwards the abstract to the redaction step.
- The abstract is processed by way of Amazon Bedrock Guardrails, which routinely redacts PII and different delicate information.
- The redacted abstract is saved or returned to the frontend utility by way of an API, the place it’s exhibited to the consumer.
Stipulations
Earlier than you begin, just be sure you have the next conditions in place:
Create a guardrail within the Amazon Bedrock console
For directions for creating guardrails in Amazon Bedrock, check with Create a guardrail. For particulars on detecting and redacting PII, see Remove PII from conversations by using sensitive information filters. Configure your guardrail with the next key settings:
- Allow PII detection and dealing with
- Set PII motion to Redact
- Add the related PII varieties, corresponding to:
- Names and identities
- Telephone numbers
- E mail addresses
- Bodily addresses
- Monetary info
- Different delicate private info
After you deploy the guardrail, word the Amazon Useful resource Title (ARN), and you can be utilizing this when deploys the mannequin.
Deploy the Whisper mannequin
Full the next steps to deploy the Whisper Giant V3 Turbo mannequin:
- On the Amazon Bedrock console, select Mannequin catalog underneath Basis fashions within the navigation pane.
- Seek for and select Whisper Giant V3 Turbo.
- On the choices menu (three dots), select Deploy.
- Modify the endpoint identify, variety of situations, and occasion kind to fit your particular use case. For this publish, we use the default settings.
- Modify the Superior settings part to fit your use case. For this publish, we use the default settings.
- Select Deploy.
This creates a brand new AWS Identity and Access Management IAM position and deploys the mannequin.
You’ll be able to select Market deployments within the navigation pane, and within the Managed deployments part, you’ll be able to see the endpoint standing as Creating. Anticipate the endpoint to complete deployment and the standing to alter to In Service, then copy the Endpoint Title, and you can be utilizing this when deploying the
Deploy the answer infrastructure
Within the GitHub repo, comply with the directions within the README file to clone the repository, then deploy the frontend and backend infrastructure.
We use the AWS Cloud Development Kit (AWS CDK) to outline and deploy the infrastructure. The AWS CDK code deploys the next assets:
- React frontend utility
- Backend infrastructure
- S3 buckets for storing uploads and processed outcomes
- Step Features state machine with Lambda capabilities for audio processing and PII redaction
- API Gateway endpoints for dealing with requests
- IAM roles and insurance policies for safe entry
- CloudFront distribution for internet hosting the frontend
Implementation deep dive
The backend consists of a sequence of Lambda capabilities, every dealing with a selected stage of the audio processing pipeline:
- Add handler – Receives audio recordsdata and shops them in Amazon S3
- Transcription with Whisper – Converts speech to textual content utilizing the Whisper mannequin
- Speaker detection – Differentiates and labels particular person audio system throughout the audio
- Summarization utilizing Amazon Bedrock – Extracts and summarizes key factors from the transcript
- PII redaction – Makes use of Amazon Bedrock Guardrails to take away delicate info for privateness compliance
Let’s study among the key parts:
The transcription Lambda operate makes use of the Whisper mannequin to transform audio recordsdata to textual content:
def transcribe_with_whisper(audio_chunk, endpoint_name):
# Convert audio to hex string format
hex_audio = audio_chunk.hex()
# Create payload for Whisper mannequin
payload = {
"audio_input": hex_audio,
"language": "english",
"process": "transcribe",
"top_p": 0.9
}
# Invoke the SageMaker endpoint working Whisper
response = sagemaker_runtime.invoke_endpoint(
EndpointName=endpoint_name,
ContentType="utility/json",
Physique=json.dumps(payload)
)
# Parse the transcription response
response_body = json.hundreds(response['Body'].learn().decode('utf-8'))
transcription_text = response_body['text']
return transcription_text
We use Amazon Bedrock to generate concise summaries from the transcriptions:
def generate_summary(transcription):
# Format the immediate with the transcription
immediate = f"{transcription}nnGive me the abstract, audio system, key discussions, and motion objects with homeowners"
# Name Bedrock for summarization
response = bedrock_runtime.invoke_model(
modelId="anthropic.claude-3-5-sonnet-20240620-v1:0",
physique=json.dumps({
"immediate": immediate,
"max_tokens_to_sample": 4096,
"temperature": 0.7,
"top_p": 0.9,
})
)
# Extract and return the abstract
consequence = json.hundreds(response.get('physique').learn())
return consequence.get('completion')
A important element of our resolution is the automated redaction of PII. We applied this utilizing Amazon Bedrock Guardrails to help compliance with privateness rules:
def apply_guardrail(bedrock_runtime, content material, guardrail_id):
# Format content material in response to API necessities
formatted_content = [{"text": {"text": content}}]
# Name the guardrail API
response = bedrock_runtime.apply_guardrail(
guardrailIdentifier=guardrail_id,
guardrailVersion="DRAFT",
supply="OUTPUT", # Utilizing OUTPUT parameter for correct stream
content material=formatted_content
)
# Extract redacted textual content from response
if 'motion' in response and response['action'] == 'GUARDRAIL_INTERVENED':
if len(response['outputs']) > 0:
output = response['outputs'][0]
if 'textual content' in output and isinstance(output['text'], str):
return output['text']
# Return unique content material if redaction fails
return content material
When PII is detected, it’s changed with kind indicators (for instance, {PHONE} or {EMAIL}), ensuring that summaries stay informative whereas defending delicate information.
To handle the advanced processing pipeline, we use Step Features to orchestrate the Lambda capabilities:
{
"Remark": "Audio Summarization Workflow",
"StartAt": "TranscribeAudio",
"States": {
"TranscribeAudio": {
"Kind": "Process",
"Useful resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "WhisperTranscriptionFunction",
"Payload": {
"bucket": "$.bucket",
"key": "$.key"
}
},
"Subsequent": "IdentifySpeakers"
},
"IdentifySpeakers": {
"Kind": "Process",
"Useful resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "SpeakerIdentificationFunction",
"Payload": {
"Transcription.$": "$.Payload"
}
},
"Subsequent": "GenerateSummary"
},
"GenerateSummary": {
"Kind": "Process",
"Useful resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "BedrockSummaryFunction",
"Payload": {
"SpeakerIdentification.$": "$.Payload"
}
},
"Finish": true
}
}
}
This workflow makes certain every step completes efficiently earlier than continuing to the following, with automated error dealing with and retry logic inbuilt.
Take a look at the answer
After you will have efficiently accomplished the deployment, you should use the CloudFront URL to check the answer performance.
Safety issues
Safety is a important facet of this resolution, and we’ve applied a number of greatest practices to help information safety and compliance:
- Delicate information redaction – Mechanically redact PII to guard consumer privateness.
- Effective-Grained IAM Permissions – Apply the precept of least privilege throughout AWS providers and assets.
- Amazon S3 entry controls – Use strict bucket insurance policies to restrict entry to approved customers and roles.
- API safety – Safe API endpoints utilizing Amazon Cognito for consumer authentication (non-compulsory however really useful).
- CloudFront safety – Implement HTTPS and apply trendy TLS protocols to facilitate safe content material supply.
- Amazon Bedrock information safety – Amazon Bedrock (together with Amazon Bedrock Market) protects buyer information and doesn’t ship information to suppliers or prepare utilizing buyer information. This makes certain your proprietary info stays safe when utilizing AI capabilities.
Clear up
To stop pointless costs, be sure that to delete the assets provisioned for this resolution once you’re executed:
- Delete the Amazon Bedrock guardrail:
- On the Amazon Bedrock console, within the navigation menu, select Guardrails.
- Select your guardrail, then select Delete.
- Delete the Whisper Giant V3 Turbo mannequin deployed by way of the Amazon Bedrock Market:
- On the Amazon Bedrock console, select Market deployments within the navigation pane.
- Within the Managed deployments part, choose the deployed endpoint and select Delete.
- Delete the AWS CDK stack by working the command
cdk destroy
, which deletes the AWS infrastructure.
Conclusion
This serverless audio summarization resolution demonstrates the advantages of mixing AWS providers to create a complicated, safe, and scalable utility. By utilizing Amazon Bedrock for AI capabilities, Lambda for serverless processing, and CloudFront for content material supply, we’ve constructed an answer that may deal with giant volumes of audio content material effectively whereas serving to you align with safety greatest practices.
The automated PII redaction characteristic helps compliance with privateness rules, making this resolution well-suited for regulated industries corresponding to healthcare, finance, and authorized providers the place information safety is paramount. To get began, deploy this structure inside your AWS setting to speed up your audio processing workflows.
In regards to the Authors
Kaiyin Hu is a Senior Options Architect for Strategic Accounts at Amazon Internet Providers, with years of expertise throughout enterprises, startups, {and professional} providers. Presently, she helps clients construct cloud options and drives GenAI adoption to cloud. Beforehand, Kaiyin labored within the Good House area, helping clients in integrating voice and IoT applied sciences.
Sid Vantair is a Options Architect with AWS masking Strategic accounts. He thrives on resolving advanced technical points to beat buyer hurdles. Exterior of labor, he cherishes spending time along with his household and fostering inquisitiveness in his youngsters.