Extracting custom structured information from images or pdf-documents –
Cloudflight and ITSV have developed an AI system to support the clerks in charge of reimbursement of elective medical expenses of some Austrian health insurance companies in order to cope with the ever-increasing paper load. Medical bills are digitized, processed and digitally archived.
Important information regarding doctor, patient and service is automatically extracted and checked against data from existing systems. In this way, the manual processing of invoices is optimized, with employees only having to check and confirm the results.
Challenge
Previously, clerks had to process all medical invoices manually to extract the relevant information such as doctor, patient, treatment and costs. More and more people are visiting a doctor of their choice, so the number of invoices that the social insurance companies have to process manually is increasing by 8% every year. The ever increasing number of such processing cases has led to a growing backlog of processing such documents for many insurance companies in recent years.


Idea
Support the clerks by automatically extracting and validating the information from the invoices, so that they don’t have to type the information manually, but only have to check it.
The idea was to combine several complex processing steps:
- Machine text recognition in combination with content recognition through artificial intelligence.
- Creation of a highly scalable system for processing large amounts of data.
Solution
Scalable web-based AI-System that processes the scans of the invoices and presents the results to the clerk in an intuitive view to review the extracted data and correct it if necessary.
The development and the system consist of the following components
- Complete integration into the existing software landscape
- Acceptance of scanned or photographed fee bills
- Document pre-processing
- OCR (Optical Character Recognition)
- Document classification (fee note, proof of payment, medical referral, other)
- Entity extraction (patient, insured person, attending physician, invoiced services, diagnoses made, costs)
- Assignment of entities to each other (date to benefit, name to insurance number)
- Post-processing / plausibility check
- Recoding of services in free text to the service catalogs of the providers
- Recoding of therapeutic products to the official reimbursement code for medicinal products
- Recoding of diagnoses to the official ICD-10 catalog
- Comparison of services with diagnoses
- Recognition and reading of handwritten IBANs
- Detection of handwritten annotations and differentiation of signatures
- Recognition of a balancing note
- Intuitive UI


Modular, automated system
Our new and modular system relieves the clerks of all automatable steps, in that the individual components such as text recognition, entity recognition or post-processing communicate via queues decoupled from each other. The necessary automatic work steps are coordinated via messages in the queues and data from databases. Wherever possible, processing steps are processed in parallel in different systems and follow-up processing is initiated as soon as all necessary previous steps have been completed. The system is both scalable and fault-tolerant to failures of individual instances.
The result of the processing is visible in a modern web interface and the data can be corrected for special cases before the reimbursement of costs to the claimant can take place.

Machine learning application
Using machine learning, the system was taught to find the required content in a large number of differently structured medical invoices in order to be able to process them automatically.
The machine learning part of the application was continuously checked against various key figures during development. Using this benchmarking, several test sets were used to continually check how changes to the system affected the detection reliability. Both individual components and the entire system were tested to ensure that the recognition would work the same reliability on future and never “seen” documents.

Digital archiving as an add-on
In addition to the digitizing the cost reimbursement process, digital archiving of documents was also introduced. These now no longer have to be stored in analog mode, but only digitally.
A positive side effect of the digitization was that employees were able to work safely from home during the Covid-19 lockdown from March 2020 onwards. As a result of this emergency situation, the application was put into trial operation a few months earlier than planned and could therefore already be used for digital processing of the invoices.
Cloudflight
The software we have developed enables health insurance administrators to process reimbursements of medical bills much more efficiently and conveniently at a much faster pace and with less repetitive work. Cloudflight realized this project thanks to experts in all areas from requirements engineers to scrum masters, software architects, DevOps and software engineers to data scientists.

ITSV
The ITSV is our implementation partner in this project. As an innovative technology company, it manages and coordinates the IT activities of the Austrian social insurance.