Remodel, analyze, and uncover insights from unstructured healthcare knowledge utilizing Amazon HealthLake

Healthcare knowledge is advanced and siloed, and exists in numerous codecs. An estimated 80% of information inside organizations is taken into account to be unstructured or “darkish” knowledge that’s locked inside textual content, emails, PDFs, and scanned paperwork. This knowledge is tough to interpret or analyze programmatically and limits how organizations can derive insights from it and serve their prospects extra successfully. The speedy price of information technology implies that organizations that aren’t investing in doc automation threat getting caught with legacy processes which are guide, gradual, error susceptible, and tough to scale.

On this submit, we suggest an answer that automates ingestion and transformation of beforehand untapped PDFs and handwritten medical notes and knowledge. We clarify easy methods to extract info from buyer medical knowledge charts utilizing Amazon Textract, then use the uncooked extracted textual content to establish discrete knowledge parts utilizing Amazon Comprehend Medical. We retailer the ultimate output in Quick Healthcare Interoperability Sources (FHIR) suitable format in Amazon HealthLake, making it obtainable for downstream analytics.

Answer overview

AWS supplies a wide range of companies and options for healthcare suppliers to unlock the worth of their knowledge. For our answer, we course of a small pattern of paperwork by Amazon Textract and cargo that extracted knowledge as applicable FHIR sources in Amazon HealthLake. We create a customized course of for FHIR conversion and take a look at it finish to finish.

The information is first loaded into DocumentReference. Amazon HealthLake then creates system-generated sources after processing this unstructured textual content in DocumentReference and masses it into Situation, MedicationStatement, and Statement sources. We establish a number of knowledge fields inside FHIR sources like affected person ID, date of service, supplier kind, and title of medical facility.

A MedicationStatement is a file of a drugs that’s being consumed by a affected person. It might point out that the affected person is taking the medicine now, has taken the medicine up to now, or will probably be taking the medicine sooner or later. A typical situation the place this info is captured is through the history-taking course of in the midst of a affected person go to or keep. The supply of medicine info might be the affected person’s reminiscence, a prescription bottle, or from a listing of medicines the affected person, clinician, or different occasion maintains.

Observations are a central ingredient in healthcare, used to help analysis, monitor progress, decide baselines and patterns, and even seize demographic traits. Most observations are easy title/worth pair assertions with some metadata, however some observations group different observations collectively logically, or might even be multi-component observations.

The Situation useful resource is used to file detailed details about a situation, downside, analysis, or different occasion, scenario, concern, or medical idea that has risen to a stage of concern. The situation might be a point-in-time analysis within the context of an encounter, an merchandise on the practitioner’s downside checklist, or a priority that doesn’t exist on the practitioner’s downside checklist.

The next diagram exhibits the workflow emigrate unstructured knowledge into FHIR for AI and machine studying (ML) evaluation in Amazon HealthLake.

The workflow steps are as follows:

  1. A doc is uploaded to an Amazon Simple Storage Service (Amazon S3) bucket.
  2. The doc add in Amazon S3 triggers an AWS Lambda operate.
  3. The Lambda operate sends the picture to Amazon Textract.
  4. Amazon Textract extracts textual content from the picture and shops the output in a separate Amazon Textract output S3 bucket.
  5. The ultimate result’s saved as particular FHIR sources (the extracted textual content is loaded in DocumentReference as base64 encoded textual content) in Amazon HealthLake to extract which means from the unstructured knowledge with built-in Amazon Comprehend Medical for simple search and querying.
  6. Customers can create significant analyses and run interactive analytics utilizing Amazon Athena.
  7. Customers can construct visualizations, carry out advert hoc evaluation, and rapidly get enterprise insights utilizing Amazon QuickSight.
  8. Customers could make predictions with well being knowledge utilizing Amazon SageMaker ML fashions.


This submit assumes familiarity with the next companies:

By default, the built-in Amazon Comprehend Medical pure language processing (NLP) functionality inside Amazon HealthLake is disabled in your AWS account. To allow it, submit a help case together with your account ID, AWS Area, and Amazon HealthLake knowledge retailer ARN. For extra info, consult with How do I turn on HealthLake’s integrated natural language processing feature.

Confer with the GitHub repo for extra deployment particulars.

Deploy the answer structure

To arrange the answer, full the next steps:

  1. Clone the GitHub repo, run cdk deploy PdfMapperToFhirWorkflow out of your command immediate or terminal and observe the README file. Deployment will full in roughly half-hour.
  2.  On the Amazon S3 console, navigate to the bucket beginning with pdfmappertofhirworkflow-, which was created as a part of cdk deploy.
  3.  Contained in the bucket, create a folder referred to as uploads and add the pattern PDF (SampleMedicalRecord.pdf).

As quickly because the doc add is profitable, it’s going to set off the pipeline, and you can begin seeing knowledge in Amazon HealthLake, which you’ll question utilizing a number of AWS instruments.

Question the information

To discover your knowledge, full the next steps:

  1. On the CloudWatch console, seek for the HealthlakeTextract log group.
  2. Within the log group particulars, notice down the distinctive ID of the doc you processed.
  3. On the Amazon HealthLake console, select Information Shops within the navigation pane.
  4. Choose your knowledge retailer and select Run question.
  5. For Question kind, select Search with GET.
  6. For Useful resource kind, select DocumentReference.
  7. For Search parameters, enter the parameter as pertains to and the worth as DocumentReference/Distinctive ID.
  8. Select Run question.
  9. Within the Response physique part, reduce the useful resource sections to only view the six sources that have been created for the six-page PDF doc.
  10. The next screenshot exhibits the built-in evaluation with Amazon Comprehend Medical and NLP enabled. The screenshot on the left is the supply PDF; the screenshot on the correct is the NLP outcome from Amazon HealthLake.
  11. You too can run a question with Question kind set as Learn and Useful resource kind set as Situation utilizing the suitable useful resource ID.

    The next screenshot exhibits the question outcomes.
  12. On the Athena console, run the next question:
    SELECT * FROM "healthlakestore"."documentreference";

Equally, you possibly can question MedicationStatement, Situation, and Statement sources.

Clear up

After you’re accomplished utilizing this answer, run cdk destroy PdfMapperToFhirWorkflow to make sure you don’t incur extra fees. For extra info, consult with AWS CDK Toolkit (cdk command).


AWS AI companies and Amazon HealthLake may help retailer, remodel, question, and analyze insights from unstructured healthcare knowledge. Though this submit solely lined a PDF medical chart, you possibly can prolong the answer to different forms of healthcare PDFs, photographs, and handwritten notes. After the information is extracted into textual content type, parsed into discrete knowledge parts utilizing Amazon Comprehend Medical, and saved in Amazon HealthLake, it might be additional enriched by downstream techniques to drive significant and actionable healthcare info and finally enhance affected person well being outcomes.

The proposed answer doesn’t require the deployment and upkeep of server infrastructure. All companies are both managed by AWS or serverless. With AWS’s pay-as-you-go billing mannequin and its depth and breadth of companies, the price and energy of preliminary setup and experimentation is considerably decrease than conventional on-premises alternate options.

Further sources

For extra details about Amazon HealthLake, consult with the next:

In regards to the Authors

Shravan Vurputoor is a Senior Options Architect at AWS. As a trusted buyer advocate, he helps organizations perceive finest practices round superior cloud-based architectures, and supplies recommendation on methods to assist drive profitable enterprise outcomes throughout a broad set of enterprise prospects by his ardour for educating, coaching, designing, and constructing cloud options. In his spare time, he enjoys studying, spending time together with his household, and cooking.

Rafael M. Koike is a Principal Options Architect at AWS supporting Enterprise prospects within the South East, and is a part of the Storage and Safety Technical Discipline Neighborhood. Rafael has a ardour to construct, and his experience in safety, storage, networking, and software improvement has been instrumental in serving to prospects transfer to the cloud securely and quick.

Randheer Gehlot is a Principal Buyer Options Supervisor at AWS. Randheer is captivated with AI/ML and its software inside HCLS business. As an AWS builder, he works with giant enterprises to design and quickly implement strategic migrations to the cloud and construct trendy, cloud-native options.

Leave a Reply

Your email address will not be published. Required fields are marked *