Automate bill processing with Streamlit and Amazon Bedrock
Bill processing is a vital but usually cumbersome activity for companies of all sizes, particularly for big enterprises coping with invoices from a number of distributors with various codecs. The sheer quantity of knowledge, coupled with the necessity for accuracy and effectivity, could make bill processing a big problem. Invoices can fluctuate broadly in format, construction, and content material, making environment friendly processing at scale troublesome. Conventional strategies counting on handbook information entry or customized scripts for every vendor’s format cannot solely result in inefficiencies, however can even enhance the potential for errors, leading to monetary discrepancies, operational bottlenecks, and backlogs.
To extract key particulars reminiscent of bill numbers, dates, and quantities, we use Amazon Bedrock, a totally managed service that provides a selection of high-performing basis fashions (FMs) from main AI corporations reminiscent of AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon by a single API, together with a broad set of capabilities it is advisable construct generative AI purposes with safety, privateness, and accountable AI.
On this publish, we offer a step-by-step information with the constructing blocks wanted for making a Streamlit utility to course of and evaluate invoices from a number of distributors. Streamlit is an open supply framework for information scientists to effectively create interactive web-based information purposes in pure Python. We use Anthropic’s Claude 3 Sonnet mannequin in Amazon Bedrock and Streamlit for constructing the applying front-end.
Answer overview
This resolution makes use of the Amazon Bedrock Knowledge Bases chat with document feature to research and extract key particulars out of your invoices, without having a information base. The outcomes are proven in a Streamlit app, with the invoices and extracted info displayed side-by-side for fast evaluate. Importantly, your doc and information are usually not saved after processing.
The storage layer makes use of Amazon Simple Storage Service (Amazon S3) to carry the invoices that enterprise customers add. After importing, you possibly can arrange a daily batch job to course of these invoices, extract key info, and save the ends in a JSON file. On this publish, we save the info in JSON format, however you may also select to retailer it in your most well-liked SQL or NoSQL database.
The applying layer makes use of Streamlit to show the PDF invoices alongside the extracted information from Amazon Bedrock. For simplicity, we deploy the app domestically, however you may also run it on Amazon SageMaker Studio, Amazon Elastic Compute Cloud (Amazon EC2), or Amazon Elastic Container Service (Amazon ECS) if wanted.
Conditions
To carry out this resolution, full the next:
Set up dependencies and clone the instance
To get began, set up the mandatory packages in your native machine or on an EC2 occasion. In the event you’re new to Amazon EC2, consult with the Amazon EC2 User Guide. This tutorial we are going to use the native machine for mission setup.
To put in dependencies and clone the instance, comply with these steps:
- Clone the repository into an area folder:
- Set up Python dependencies
- Navigate to the mission listing:
- Improve pip
- (Non-obligatory) Create a digital atmosphere isolate dependencies:
- Activate the digital atmosphere:
- Mac/Linux:
- Home windows:
- Within the cloned listing, invoke the next to put in the mandatory Python packages:
It will set up the mandatory packages, together with Boto3 (AWS SDK for Python), Streamlit, and different dependencies.
- Replace the
area
within theconfig.yaml
file to the identical Area set in your AWS CLI the place Amazon Bedrock and Anthropic’s Claude 3 Sonnet mannequin can be found.
After finishing these steps, the bill processor code will probably be arrange in your native atmosphere and will probably be prepared for the following levels to course of invoices utilizing Amazon Bedrock.
Course of invoices utilizing Amazon Bedrock
Now that the atmosphere setup is completed, you’re prepared to start out processing invoices and deploying the Streamlit app. To course of invoices utilizing Amazon Bedrock, comply with these steps:
Retailer invoices in Amazon S3
Retailer invoices from completely different distributors in an S3 bucket. You possibly can add them immediately utilizing the console, API, or as a part of your common enterprise course of. Comply with these steps to add utilizing the CLI:
- Create an S3 bucket:
Substitute
your-bucket-name
with the identify of the bucket you created andyour-region
with the Area set in your AWS CLI and inconfig.yaml
(for instance,us-east-1
) - Add invoices to S3 bucket. Use one of many following instructions to add the bill to S3.
- To add invoices to the basis of the bucket:
- To add invoices to a selected folder (for instance,
invoices
): - Validate the add:
Course of invoices with Amazon Bedrock
On this part, you’ll course of the invoices in Amazon S3 and retailer the ends in a JSON file (processed_invoice_output.json
). You’ll extract the important thing particulars from the invoices (reminiscent of bill numbers, dates, and quantities) and generate summaries.
You possibly can set off the processing of those invoices utilizing the AWS CLI or automate the method with an Amazon EventBridge rule or AWS Lambda set off. For this walkthrough, we are going to use the AWS CLI to set off the processing.
We packaged the processing logic within the Python script invoices_processor.py
, which may be run as follows:
The --prefix
argument is non-compulsory. If omitted, all the PDFs within the bucket will probably be processed. For instance:
or
Use the answer
This part examines the invoices_processor.py
code. You possibly can chat together with your doc both on the Amazon Bedrock console or by utilizing the Amazon Bedrock RetrieveAndGenerate API (SDK). On this tutorial, we use the API strategy.
-
- Initialize the atmosphere: The script imports the mandatory libraries and initializes the Amazon Bedrock and Amazon S3 consumer.
- Configure : The
config.yaml
file specifies the mannequin ID, Area, prompts for entity extraction, and the output file location for processing. - Arrange API calls: The
RetrieveAndGenerate
API fetches the bill from Amazon S3 and processes it utilizing the FM. It takes a number of parameters, reminiscent of immediate, supply sort (S3), mannequin ID, AWS Area, and S3 URI of the bill. - Batch processing: The
batch_process_s3_bucket_invoices
perform batch course of the invoices in parallel within the specified S3 bucket and writes the outcomes to the output file (processed_invoice_output.json
as specified byoutput_file
inconfig.yaml
). It depends on the process_invoice perform, which calls the Amazon Bedrock RetrieveAndGenerate API for every bill and immediate. - Submit-processing: The extracted information in
processed_invoice_output.json
may be additional structured or custom-made to fit your wants.
This strategy permits bill dealing with from a number of distributors, every with its personal distinctive format and construction. By utilizing giant language fashions (LLMs), it extracts essential particulars reminiscent of bill numbers, dates, quantities, and vendor info with out requiring customized scripts for every vendor format.
Run the Streamlit demo
Now that you’ve got the elements in place and the invoices processed utilizing Amazon Bedrock, it’s time to deploy the Streamlit utility. You possibly can launch the app by invoking the next command:
When the app is up, it is going to open in your default net browser. From there, you possibly can evaluate the invoices and the extracted information side-by-side. Use the Earlier and Subsequent arrows to seamlessly navigate by the processed invoices so you possibly can work together with and analyze the outcomes effectively. The next screenshot exhibits the UI.
There are quotas for Amazon Bedrock (of which some are adjustable) that it is advisable take into account when constructing at scale with Amazon Bedrock.
Cleanup
To wash up after operating the demo, comply with these steps:
- Delete the S3 bucket containing your invoices utilizing the command
- In the event you arrange a digital atmosphere, deactivate it by invoking
deactivate
- Take away any native recordsdata created in the course of the course of, together with the cloned repository and output recordsdata
- In the event you used any AWS sources reminiscent of an EC2 occasion, terminate them to keep away from pointless fees
Conclusion
On this publish, we walked by a step-by-step information to automating bill processing utilizing Streamlit and Amazon Bedrock, addressing the problem of dealing with invoices from a number of distributors with completely different codecs. We confirmed the best way to arrange the atmosphere, course of invoices saved in Amazon S3, and deploy a user-friendly Streamlit utility to evaluate and work together with the processed information.
If you’re trying to additional improve this resolution, take into account integrating extra options or deploying the app on scalable AWS companies reminiscent of Amazon SageMaker, Amazon EC2, or Amazon ECS. Because of this flexibility, your bill processing resolution can evolve with what you are promoting, offering long-term worth and effectivity.
We encourage you to study extra by exploring Amazon Bedrock, Access Amazon Bedrock foundation models, RetrieveAndGenerate API, and Quotas for Amazon Bedrock and constructing an answer utilizing the pattern implementation offered on this publish and a dataset related to what you are promoting. In case you have questions or recommendations, depart a remark.
Concerning the Authors
Deepika Kumar is a Answer Architect at AWS. She has over 13 years of expertise within the know-how trade and has helped enterprises and SaaS organizations construct and securely deploy their workloads on the cloud securely. She is obsessed with utilizing Generative AI in a accountable method whether or not that’s driving product innovation, increase productiveness or enhancing buyer experiences.
Jobandeep Singh is an Affiliate Answer Architect at AWS specializing in Machine Studying. He helps prospects throughout a variety of industries to leverage AWS, driving innovation and effectivity of their operations. In his free time, he enjoys enjoying sports activities, with a selected love for hockey.
Ratan Kumar is a options architect primarily based out of Auckland, New Zealand. He works with giant enterprise prospects serving to them design and construct safe, cost-effective, and dependable web scale purposes utilizing the AWS cloud. He’s obsessed with know-how and likes sharing information by weblog posts and twitch classes.