Unlock organizational knowledge utilizing voice-driven data seize with Amazon Transcribe and Amazon Bedrock

Preserving and benefiting from institutional data is crucial for organizational success and adaptableness. This collective knowledge, comprising insights and experiences gathered by staff over time, typically exists as tacit data handed down informally. Formalizing and documenting this invaluable useful resource can assist organizations keep institutional reminiscence, drive innovation, improve decision-making processes, and speed up onboarding for brand new staff. Nevertheless, successfully capturing and documenting this information presents important challenges. Conventional strategies, resembling guide documentation or interviews, are sometimes time-consuming, inconsistent, and vulnerable to errors. Furthermore, probably the most invaluable data ceaselessly resides within the minds of seasoned staff, who might discover it troublesome to articulate or lack the time to doc their experience comprehensively.

This publish introduces an modern voice-based software workflow that harnesses the facility of Amazon Bedrock, Amazon Transcribe, and React to systematically seize and doc institutional data by way of voice recordings from skilled employees members. Amazon Bedrock is a totally managed service that gives a alternative of high-performing basis fashions (FMs) from main synthetic intelligence (AI) firms resembling AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon by way of a single API, together with a broad set of capabilities to construct generative AI functions with safety, privateness, and accountable AI. Our answer makes use of Amazon Transcribe for real-time speech-to-text conversion, enabling correct and instant documentation of spoken data. We then use generative AI, powered by Amazon Bedrock, to investigate and summarize the transcribed content material, extracting key insights and producing complete documentation.

The front-end of our software is constructed utilizing React, a preferred JavaScript library for creating dynamic UIs. This React-based UI seamlessly integrates with Amazon Transcribe, offering customers with a real-time transcription expertise. As staff communicate, they will observe their phrases transformed to textual content in real-time, allowing instant assessment and modifying.

By combining the React front-end UI with Amazon Transcribe and Amazon Bedrock, we’ve created a complete answer for capturing, processing, and preserving invaluable institutional data. This strategy not solely streamlines the documentation course of but in addition enhances the standard and accessibility of the captured data, supporting operational excellence and fostering a tradition of steady studying and enchancment inside organizations.

Resolution overview

This answer makes use of a mix of AWS companies, together with Amazon Transcribe, Amazon Bedrock, AWS Lambda, Amazon Simple Storage Service (Amazon S3), and Amazon CloudFront, to ship real-time transcription and doc era. This answer makes use of a mix of cutting-edge applied sciences to create a seamless data seize course of:

Person interface – A React-based front-end, distributed by way of Amazon CloudFront, offers an intuitive interface for workers to enter voice information.
Actual-time transcription – Amazon Transcribe streaming converts speech to textual content in actual time, offering correct and instant transcription of spoken data.
Clever processing – A Lambda operate, powered by generative AI fashions by way of Amazon Bedrock, analyzes and summarizes the transcribed textual content. It goes past easy summarization by performing the next actions:
- Extracting key ideas and terminologies.
- Structuring the data right into a coherent, well-organized doc.
Safe storage – Uncooked audio information, processed data, summaries, and generated content material are securely saved in Amazon S3, offering scalable and sturdy storage for this invaluable data repository. S3 bucket insurance policies and encryption are carried out to implement information safety and compliance.

This answer makes use of a customized authorization Lambda operate with Amazon API Gateway as a substitute of extra complete identification administration options resembling Amazon Cognito. This strategy was chosen for a number of causes:

Simplicity – As a pattern software, it doesn’t demand full consumer administration or login performance
Minimal consumer friction – Customers don’t have to create accounts or log in, simplifying the consumer expertise
Fast implementation – For speedy prototyping, this strategy could be quicker to implement than organising a full consumer administration system
Momentary credential administration – Companies can use this strategy to supply safe, non permanent entry to AWS companies with out embedding long-term credentials within the software

Though this answer works nicely for this particular use case, it’s essential to notice that for manufacturing functions, particularly these coping with delicate information or needing user-specific performance, a extra sturdy identification answer resembling Amazon Cognito would usually be beneficial.

The next diagram illustrates the structure of our answer.

SolutionArchitecture

The workflow consists of the next steps:

Customers entry the front-end UI software, which is distributed by way of CloudFront
The React internet software sends an preliminary request to Amazon API Gateway
API Gateway forwards the request to the authorization Lambda operate
The authorization operate checks the request towards the AWS Identity and Access Management (IAM) function to verify correct permissions
The authorization operate sends non permanent credentials again to the front-end software by way of API Gateway
With the non permanent credentials, the React internet software communicates instantly with Amazon Transcribe for real-time speech-to-text conversion because the consumer information their enter
After recording and transcription, the consumer sends (by way of the front-end UI) the transcribed texts and audio information to the backend by way of API Gateway
API Gateway routes the licensed request (containing transcribed textual content and audio information) to the orchestration Lambda operate
The orchestration operate sends the transcribed textual content for summarization
The orchestration operate receives summarized textual content from Amazon Bedrock to generate content material
The orchestration operate shops the generated PDF information and recorded audio information within the artifacts S3 bucket

Stipulations

You want the next stipulations:

Deploy the answer with the AWS CDK

The AWS Cloud Development Kit (AWS CDK) is an open supply software program growth framework for outlining cloud infrastructure as code and provisioning it by way of AWS CloudFormation. Our AWS CDK stack deploys assets from the next AWS companies:

To deploy the answer, full the next steps:

Clone the GitHub repository: genai-knowledge-capture-webapp
Comply with the Stipulations part within the README.md file to arrange your native setting

As of this writing, this answer helps deployment to the us-east-1 Area. The CloudFront distribution on this answer is geo-restricted to the US and Canada by default. To vary this configuration, confer with the react-app-deploy.ts GitHub repo.

Invoke npm set up to put in the dependencies
Invoke cdk deploy to deploy the answer

The deployment course of usually takes 20–half-hour. When the deployment is full, CodeBuild will construct and deploy the React software, which generally takes 2–3 minutes. After that, you may entry the UI on the ReactAppUrl URL that’s output by the AWS CDK.

Amazon Transcribe Streaming inside React software

Our answer’s front-end is constructed utilizing React, a preferred JavaScript library for creating dynamic consumer interfaces. We combine Amazon Transcribe streaming into our React software utilizing the aws-sdk/client-transcribe-streaming library. This integration allows real-time speech-to-text performance, so customers can observe their spoken phrases transformed to textual content immediately.

The true-time transcription provides a number of advantages for data seize:

With the instant suggestions, audio system can right or make clear their statements within the second
The visible illustration of spoken phrases can assist keep focus and construction within the data sharing course of
It reduces the cognitive load on the speaker, who doesn’t want to fret about note-taking or remembering key factors

On this answer, the Amazon Transcribe consumer is managed in a reusable React hook, useAudioTranscription.ts. A further React hook, useAudioProcessing.ts, implements the mandatory audio stream processing. Check with the GitHub repo for extra data. The next is a simplified code snippet demonstrating the Amazon Transcribe consumer integration:

// Create Transcribe consumer
transcribeClientRef.present = new TranscribeStreamingClient({
  area: credentials.Area,
  credentials: {
    accessKeyId: credentials.AccessKeyId,
    secretAccessKey: credentials.SecretAccessKey,
    sessionToken: credentials.SessionToken,
  },
});

// Create Transcribe Begin Command
const transcribeStartCommand = new StartStreamTranscriptionCommand({
  LanguageCode: transcribeLanguage,
  MediaEncoding: audioEncodingType,
  MediaSampleRateHertz: audioSampleRate,
  AudioStream: getAudioStreamGenerator(),
});

// Begin Transcribe session
const information = await transcribeClientRef.present.ship(
  transcribeStartCommand
);
console.log("Transcribe session established ", information.SessionId);
setIsTranscribing(true);

// Course of Transcribe end result stream
if (information.TranscriptResultStream) {
  strive {
    for await (const occasion of information.TranscriptResultStream) {
      handleTranscriptEvent(occasion, setTranscribeResponse);
    }
  } catch (error) {
    console.error("Error processing transcript end result stream:", error);
  }
}

For optimum outcomes, we advocate utilizing a good-quality microphone and talking clearly. On the time of writing, the system helps main dialects of English, with plans to develop language help in future updates.

Use the applying

After deployment, open the ReactAppUrl hyperlink (https://<cloud entrance area title>.cloudfront.internet) in your browser (the answer helps Chrome, Firefox, Edge, Safari, and Courageous browsers on Mac and Home windows). An internet UI opens, as proven within the following screenshot.

ApplicationPage

To make use of this software, full the next steps:

Enter a query or subject.
Enter a file title for the doc.
Select Begin Transcription and begin recording your enter for the given query or subject. The transcribed textual content will likely be proven within the Transcription field in actual time.
After recording, you may edit the transcribed textual content.
It’s also possible to select the play icon to play the recorded audio clips.
Select Generate Doc to invoke the backend service to generate a doc from the enter query and related transcription. In the meantime, the recorded audio clips are despatched to an S3 bucket for future evaluation.

The doc era course of makes use of FMs from Amazon Bedrock to create a well-structured, skilled doc. The FM mannequin performs the next actions:

Organizes the content material into logical sections with applicable headings
Identifies and highlights essential ideas or terminologies
Generates a quick govt abstract firstly of the doc
Applies constant formatting and styling

The audio information and generated paperwork are saved in a devoted S3 bucket, as proven within the following screenshot, with applicable encryption and entry controls in place.

Select View Doc after you generate the doc, and you’ll discover knowledgeable PDF doc generated with the consumer’s enter in your browser, accessed by way of a presigned URL.

S3_backend

Further data

To additional improve your data seize answer and deal with particular use instances, contemplate the extra options and greatest practices mentioned on this part.

Customized vocabulary with Amazon Transcribe

For industries with specialised terminology, Amazon Transcribe provides a customized vocabulary characteristic. You’ll be able to outline industry-specific phrases, acronyms, and phrases to enhance transcription accuracy. To implement this, full the next steps:

Create a customized vocabulary file together with your specialised phrases
Use the Amazon Transcribe API so as to add this vocabulary to your account
Specify the customized vocabulary in your transcription requests

Asynchronous file uploads

For dealing with massive audio information or bettering consumer expertise, implement an asynchronous add course of:

Create a separate Lambda operate for file uploads
Use Amazon S3 presigned URLs to permit direct uploads from the consumer to Amazon S3
Invoke the add Lambda operate utilizing S3 Occasion Notifications

Multi-topic doc era

For producing complete paperwork overlaying a number of subjects, confer with the next AWS Prescriptive Steerage sample: Document institutional knowledge from voice inputs by using Amazon Bedrock and Amazon Transcribe. This sample offers a scalable strategy to combining a number of voice inputs right into a single, coherent doc.

Key advantages of this strategy embrace:

Environment friendly seize of complicated, multifaceted data
Improved doc construction and coherence
Diminished cognitive load on subject material specialists (SMEs)

Use captured data as a data base

The data captured by way of this answer can function a invaluable, searchable data base in your group. To maximise its utility, you may combine with enterprise search options resembling Amazon Bedrock Knowledge Bases to make the captured data rapidly discoverable. Moreover, you may arrange common assessment and replace cycles to maintain the data base present and related.

Clear up

While you’re achieved testing the answer, take away it out of your AWS account to keep away from future prices:

Invoke cdk destroy to take away the answer
You might also have to manually take away the S3 buckets created by the answer

Abstract

This publish demonstrates the facility of mixing AWS companies resembling Amazon Transcribe and Amazon Bedrock with in style front-end frameworks resembling React to create a sturdy data seize answer. Through the use of real-time transcription and generative AI, organizations can effectively doc and protect invaluable institutional data, fostering innovation, bettering decision-making, and sustaining a aggressive edge in dynamic enterprise environments.

We encourage you to discover this answer additional by deploying it in your personal setting and adapting it to your group’s particular wants. The supply code and detailed directions can be found in our genai-knowledge-capture-webapp GitHub repository, offering a stable basis in your data seize initiatives.

By embracing this modern strategy to data seize, organizations can unlock the total potential of their collective knowledge, driving steady enchancment and sustaining their aggressive edge.

Concerning the Authors

Jundong Qiao is a Machine Studying Engineer at AWS Skilled Service, the place he makes a speciality of implementing and enhancing AI/ML capabilities throughout numerous sectors. His experience encompasses constructing next-generation AI options, together with chatbots and predictive fashions that drive effectivity and innovation.

Michael Massey is a Cloud Software Architect at Amazon Internet Companies. He helps AWS clients obtain their targets by constructing highly-available and highly-scalable options on the AWS Cloud.

Praveen Kumar Jeyarajan is a Principal DevOps Advisor at AWS, supporting Enterprise clients and their journey to the cloud. He has 13+ years of DevOps expertise and is expert in fixing myriad technical challenges utilizing the most recent applied sciences. He holds a Masters diploma in Software program Engineering. Outdoors of labor, he enjoys watching films and taking part in tennis.