Scaling medical content material evaluation at Flo Well being utilizing Amazon Bedrock (Half 1)
This weblog publish relies on work co-developed with Flo Well being.
Healthcare science is quickly advancing. Sustaining correct and up-to-date medical content material straight impacts folks’s lives, well being choices, and well-being. When somebody searches for well being info, they’re typically at their most susceptible, making accuracy not simply vital, however probably life-saving.
Flo Health creates hundreds of medical articles yearly, offering thousands and thousands of customers worldwide with medically credible info on girls’s well being. Verifying the accuracy and relevance of this huge content material library is a big problem. Medical data evolves repeatedly, and handbook evaluation of every article shouldn’t be solely time-consuming but in addition susceptible to human error. For this reason the group at Flo Well being, the corporate behind the main girls’s well being app Flo, is utilizing generative AI to facilitate medical content material accuracy at scale. By means of a partnership with AWS Generative AI Innovation Center, Flo Well being is creating an modern strategy, additional known as, “Medical Automated Content material Evaluate and Revision Optimization Answer” (MACROS) to confirm and preserve the accuracy of its in depth well being info library. This AI-powered resolution is able to:
- Effectively processing giant volumes of medical content material primarily based on credible scientific sources.
- Figuring out potential inaccuracies or outdated info primarily based on credible scientific assets.
- Proposing updates primarily based on the newest medical analysis and pointers, in addition to incorporating person suggestions.
The system powered by Amazon Bedrock allows Flo Well being to conduct medical content material evaluations and revision assessments at scale, making certain up-to-date accuracy and supporting extra knowledgeable healthcare decision-making. This technique performs detailed content material evaluation, offering complete insights on medical requirements and pointers adherence for Flo’s medical consultants to evaluation. Additionally it is designed for seamless integration with Flo’s current tech infrastructure, facilitating automated updates the place applicable.
This two-part sequence explores Flo Well being’s journey with generative AI for medical content material verification. Half 1 examines our proof of idea (PoC), together with the preliminary resolution, capabilities, and early outcomes. Half 2 covers specializing in scaling challenges and real-world implementation. Every article stands alone whereas collectively displaying how AI transforms medical content material administration at scale.
Proof of Idea targets and success standards
Earlier than diving into the technical resolution, we established clear aims for our PoC medical content material evaluation system:
Key Goals:
- Validate the feasibility of utilizing generative AI for medical content material verification
- Decide accuracy ranges in comparison with handbook evaluation
- Assess processing time and value enhancements
Success Metrics:
- Accuracy: Content material piece recall of 90%
- Effectivity: Scale back detection time from hours to minutes per guideline
- Value Discount: Scale back professional evaluation workload
- High quality: Keep Flo’s editorial requirements and medical accuracy
- Velocity: 10x quicker than handbook evaluation course of
To confirm the answer meets Flo Well being’s excessive requirements for medical content material, Flo Well being’s medical consultants and content material groups have been working intently with AWS technical specialists by means of common evaluation periods, offering important suggestions and medical experience to repeatedly improve the AI mannequin’s efficiency and accuracy. The result’s MACROS, our custom-built resolution for AI-assisted medical content material verification.
Answer overview
On this part, we define how the MACROS resolution makes use of Amazon Bedrock and different AWS companies to automate medical content material evaluation and revisions.

Determine 1. Medical Automated Content material Evaluate and Revision Optimization Answer Overview
As proven in Determine 1, the developed resolution helps two main processes:
- Content material Evaluate and Revision: Permits the medical requirements and magnificence adherence of current medical articles at scale given the pre-specified {custom} guidelines and pointers and proposes a revision that conforms to the brand new medical requirements in addition to Flo’s model and tone pointers.
- Rule Optimization: MACROS accelerates the method of extracting the brand new (medical) pointers from the (medical) analysis, pre-processing them into the format wanted for content material evaluation, in addition to optimizing their high quality.
Each steps might be performed by means of the person interface (UI) in addition to the direct API name. The UI assist allows medical consultants to straight see the content material evaluation statistics, work together with modifications, and do handbook changes. The API name assist is meant for the mixing into pipeline for periodic evaluation.
Structure
Determine 2 depicts the structure of MACROS. It consists of two main elements: backend and frontend.
Determine 2. MACROS structure
Within the following, the circulate of main app parts is introduced:
1. Customers start by gathering and getting ready content material that should meet medical requirements and guidelines.
2. Within the second step, the info is offered as PDF, TXT information or textual content by means of the Streamlit UI that’s hosted in Amazon Elastic Container Service (ECS). The authentication for file add occurs by means of Amazon API Gateway
3. Alternatively, {custom} Flo Well being JSON information might be straight uploaded to the Amazon Simple Storage Service (S3) bucket of the answer stack.
4. The ECS hosted frontend has AWS IAM permissions to orchestrate duties utilizing AWS Step Functions.
5. Additional, the ECS container has entry to the S3 for itemizing, downloading and importing information both through pre-signed URL or boto3.
6. Optionally, if the enter file is uploaded through the UI, the answer invokes AWS Step Functions service that begins the pre-processing performance inside hosted by an AWS Lambda operate. This Lambda has entry to Amazon Textract for extracting textual content from PDF information. The information are saved in S3 and in addition returned to the UI.
7-9. Hosted on AWS Lambda, Rule Optimizer, Content material Evaluate and Revision capabilities are orchestrated through AWS Step Function. They’ve entry to Amazon Bedrock for generative AI capabilities to carry out rule extraction from unstructured knowledge, content material evaluation and revision, respectively. Moreover, they’ve entry to S3 through boto3 SDK to retailer the outcomes.
10. The Compute Stats AWS Lambda operate has entry to S3 and might learn and mix the outcomes of particular person revision and evaluation runs.
11. The answer leverages Amazon CloudWatch for system monitoring and log administration. For manufacturing deployments coping with important medical content material, the monitoring capabilities could possibly be prolonged with {custom} metrics and alarms to supply extra granular insights into system efficiency and content material processing patterns.
Future enhancements
Whereas our present structure makes use of AWS Step Functions for workflow orchestration, we’re exploring the potential of Amazon Bedrock Flows for future iterations. Bedrock Flows presents promising capabilities for streamlining AI-driven workflows, probably simplifying our structure and enhancing integration with different Bedrock companies. This various might present extra seamless administration of our AI processes, particularly as we scale and evolve our resolution.
Content material evaluation and revision
On the core of MACROS lies its Content material Evaluate and Revision performance with Amazon Bedrock basis fashions. The Content material Evaluate and Revision block consists of 5 main parts: 1) The elective Filtering stage 2) Chunking 3) Evaluate 4) Revision and 5) Publish-processing, depicted in Determine 3.
Determine 3. Content material evaluation and revision pipeline
Right here’s how MACROS processes the uploaded medical content material:
- Filtering (Non-obligatory): The journey begins with an elective filtering step. This good function checks whether or not the algorithm is related for the article, probably saving time and assets on pointless processing.
- Chunking: The supply textual content is then break up into paragraphs. This important step facilitates good high quality evaluation and helps forestall unintended revisions to unrelated textual content. Chunking might be performed utilizing heuristics, akin to punctuation or common expression-based splits, in addition to utilizing giant language fashions (LLM) to establish semantically full chunks of textual content.
- Evaluate: Every paragraph or part undergoes an intensive evaluation towards the related guidelines and pointers.
- Revision: Solely the paragraphs flagged as non-adherent transfer ahead to the revision stage, streamlining the method and sustaining the integrity of adherent content material. The AI suggests updates to convey non-adherent paragraphs in step with the newest pointers and Flo’s model necessities.
- Publish-processing: Lastly, the revised paragraphs are seamlessly built-in again into the unique textual content, leading to an up to date, adherent doc.
The Filtering step might be performed utilizing an extra LLM through Amazon Bedrock name that assesses every part individually with the next immediate construction:
Determine 4. Simplified LLM-based filtering step
Additional, non-LLM approaches might be possible to assist the Filtering step:
- Encoding the foundations and the articles into dense embedding vectors and calculating similarity between them. By setting the similarity threshold we will establish which rule set is taken into account to be related for the enter doc.
- Equally, the direct keyword-level overlap between the doc and the rule might be recognized utilizing BLEU or ROUGE metrics.
Content material evaluation, as already talked about, is performed on a textual content part foundation towards group of guidelines and results in response in XML format, akin to:
Right here, 1 signifies adherence and 0 – non-adherence of the textual content to the desired guidelines. Utilizing XML format helps to attain dependable parsing of the output.
This Evaluate step iterates over the sections within the textual content to be sure that the LLM pays consideration to every part individually, which led to extra strong ends in our experimentation. To facilitate greater non-adherent part detection accuracy, the person may also use the Multi-call mode, the place as a substitute of 1 Amazon Bedrock name assessing adherence of the article towards all guidelines, we’ve got one unbiased name per rule.
The Revision step receives the output of the Evaluate (non-adherent sections and the explanations for non-adherence), in addition to the instruction to create the revision in an identical tone. It then suggests revisions of the non-adherent sentences in a mode much like the unique textual content. Lastly, the Publish-processing step combines the unique textual content with new revisions, ensuring that no different sections are modified.
Completely different steps of the circulate require totally different ranges of LLM mannequin complexity. Whereas less complicated duties like chunking might be finished effectively with a comparatively small mannequin like Claude Haiku fashions household, extra complicated reasoning duties like content material evaluation and revision require bigger fashions like Claude Sonnet or Opus fashions household to facilitate correct evaluation and high-quality content material era. This tiered strategy to mannequin choice optimizes each efficiency and cost-efficiency of the answer.
Working modes
The Content material Evaluate and Revision function operates in two UI modes: Detailed Doc Processing and Multi Doc Processing, every catering to totally different scales of content material administration. The Detailed Doc Processing mode presents a granular strategy to content material evaluation and is depicted in Determine 5. Customers can add paperwork in varied codecs (PDF, TXT, JSON or paste textual content straight) and specify the rules towards which the content material ought to be evaluated.

Determine 5. Detailed Doc Processing instance
Customers can select from predefined rule units, right here, Vitamin D, Breast Well being, and Premenstrual Syndrome and Dysphoric Dysfunction (PMS and PMDD), or enter {custom} pointers. These {custom} pointers can embrace guidelines akin to “The title of the article have to be medically correct” in addition to adherent and non-adherent to the rule examples of content material.
The rulesets be sure that the evaluation aligns with particular medical requirements and Flo’s distinctive model information. The interface permits for on-the-fly changes, making it supreme for thorough, particular person doc evaluations. For larger-scale operations, the Multi Doc Processing mode ought to be used. This mode is designed to deal with quite a few {custom} JSON information concurrently, mimicking how Flo would combine MACROS into their content material administration system.
Extracting guidelines and pointers from unstructured knowledge
Actionable and well-prepared pointers usually are not all the time instantly out there. Generally they’re given in unstructured information or have to be discovered. Utilizing the Rule Optimizer function, we will extract and refine actionable pointers from a number of complicated paperwork.
Rule Optimizer processes uncooked PDF paperwork to extract textual content, which is then chunked into significant sections primarily based on doc headers. This segmented content material is processed by means of Amazon Bedrock utilizing specialised system prompts, with two distinct modes: Model/tonality and Medical mode.
Model/tonality mode focuses on extracting the rules on how the textual content ought to be written, its model, what codecs and phrases can or can’t be used.
Rule Optimizer assigns a precedence for every rule: excessive, medium, and low. The precedence degree signifies the rule’s significance, guiding the order of content material evaluation and focusing consideration on important areas first. Rule Optimizer features a handbook modifying interface the place customers can refine rule textual content, alter classifications, and handle priorities. Subsequently, if customers have to replace a given rule, the modifications are saved for future use in Amazon S3.
The Medical mode is designed to course of medical paperwork and is tailored to a extra scientific language. It permits grouping of extracted guidelines into three courses:
- Medical situation pointers
- Remedy particular pointers
- Modifications to recommendation and developments in well being
Determine 6. Simplified medical rule optimization immediate
Determine 6 offers an instance of a medical rule optimization immediate, consisting of three foremost parts: position setting – medical AI professional, description of what makes a superb rule, and eventually the anticipated output. We establish the sufficiently good high quality for a rule whether it is:
- Clear, unambiguous, and actionable
- Related, constant, and concise (max two sentences)
- Written in lively voice
- Avoids pointless jargon
Implementation concerns and challenges
Throughout our PoC growth, we recognized a number of essential concerns that may profit others implementing comparable options:
- Knowledge preparation: This emerged as a elementary problem. We realized the significance of standardizing enter codecs for each medical content material and pointers whereas sustaining constant doc buildings. Creating various check units throughout totally different medical subjects proved important for complete validation.
- Value administration: Monitoring and optimizing value shortly turned a key precedence. We carried out token utilization monitoring and optimized immediate design and batch processing to stability efficiency and effectivity.
- Regulatory and moral compliance: Given the delicate nature of medical content material, strict regulatory and moral safeguards have been important. We established strong documentation practices for AI choices, carried out strict model management for medical pointers and steady human medical professional oversight for the AI-generated recommendations. Regional healthcare laws have been fastidiously thought of all through implementation.
- Integration and scaling: We suggest beginning with a standalone testing surroundings whereas planning for future content material administration system (CMS) integration by means of well-designed API endpoints. Constructing with modularity in thoughts proved priceless for future enhancements. All through the method, we confronted widespread challenges akin to sustaining context in lengthy medical articles, balancing processing velocity with accuracy, and facilitating constant tone throughout AI-suggested revisions.
- Mannequin optimization: The varied mannequin choice functionality of Amazon Bedrock proved significantly priceless. By means of its platform, we will select optimum fashions for particular duties, obtain value effectivity with out sacrificing accuracy, and easily improve to newer fashions – all whereas sustaining our current structure.
Preliminary Outcomes
Our Proof of Idea delivered sturdy outcomes throughout the important success metrics, demonstrating the potential of AI-assisted medical content material evaluation. The answer exceeded goal processing velocity enhancements whereas sustaining 80% accuracy and over 90% recall in figuring out content material requiring updates. Most notably, the AI-powered system utilized medical pointers extra constantly than handbook evaluations and considerably lowered the time burden on medical consultants.
Key Takeaways
Throughout implementation, we uncovered a number of insights important for optimizing AI efficiency in medical content material evaluation. Content material chunking was important for correct evaluation throughout lengthy paperwork, and professional validation of parsing guidelines helped medical consultants to take care of scientific precision.Most significantly, the undertaking confirmed that human-AI collaboration – not full automation – is vital to profitable implementation. Common professional suggestions and clear efficiency metrics guided system refinements and incremental enhancements. Whereas the system considerably streamlines the evaluation course of, it really works finest as an augmentation device, with medical consultants remaining important for ultimate validation, making a extra environment friendly hybrid strategy to medical content material administration.
Conclusion and subsequent steps
This primary a part of our sequence demonstrates how generative AI could make the medical content material evaluation course of quicker, extra environment friendly, and scalable whereas sustaining excessive accuracy. Keep tuned for Half 2 of this sequence, the place we cowl the manufacturing journey, deep diving into challenges and scaling methods.Are you prepared to maneuver your AI initiatives into manufacturing?
In regards to the authors
Liza (Elizaveta) Zinovyeva, Ph.D., is an Utilized Scientist at AWS Generative AI Innovation Middle and relies in Berlin. She helps clients throughout totally different industries to combine Generative AI into their current purposes and workflows. She is obsessed with AI/ML, finance and software program safety subjects. In her spare time, she enjoys spending time together with her household, sports activities, studying new applied sciences, and desk quizzes.
Callum Macpherson is a Knowledge Scientist on the AWS Generative AI Innovation Middle, the place cutting-edge AI meets real-world enterprise transformation. Callum companions straight with AWS clients to design, construct, and scale generative AI options that unlock new alternatives, speed up innovation, and ship measurable impression throughout industries.
Arefeh Ghahvechi is a Senior AI Strategist on the AWS GenAI Innovation Middle, specializing in serving to clients notice fast worth from generative AI applied sciences by bridging innovation and implementation. She identifies high-impact AI alternatives whereas constructing the organizational capabilities wanted for scaled adoption throughout enterprises and nationwide initiatives.
Nuno Castro is a Sr. Utilized Science Supervisor. He’s has 19 years expertise within the subject in industries akin to finance, manufacturing, and journey, main ML groups for 11 years.
Dmitrii Ryzhov is a Senior Account Supervisor at Amazon Net Providers (AWS), serving to digital-native firms unlock enterprise potential by means of AI, generative AI, and cloud applied sciences. He works intently with clients to establish high-impact enterprise initiatives and speed up execution by orchestrating strategic AWS assist, together with entry to the fitting experience, assets, and innovation applications.
Nikita Kozodoi, PhD, is a Senior Utilized Scientist on the AWS Generative AI Innovation Middle engaged on the frontier of AI analysis and enterprise. Nikita builds and deploys generative AI and ML options that clear up real-world issues and drive enterprise impression for AWS clients throughout industries.
Aiham Taleb, PhD, is a Senior Utilized Scientist on the Generative AI Innovation Middle, working straight with AWS enterprise clients to leverage Gen AI throughout a number of high-impact use circumstances. Aiham has a PhD in unsupervised illustration studying, and has business expertise that spans throughout varied machine studying purposes, together with pc imaginative and prescient, pure language processing, and medical imaging.