From challenges to solutions
More and more unstructured or only partly structured text data is produced every day. Receipts, forms, descriptions, contracts, order requests or technical documents are only a few examples. The desired information often has to be extracted manually, which implies the need of a lot of time and human resources.
The value of the information that is hidden in deposited documents is often underestimated. As a consequence, a lot of valuable information is not integrated in the business flow.
Automate the process of entity extraction out of various document types to enhance your business workflow.
A machine learning model is trained to extract custom named entities from unstructured text data. Entities can be for example names, dates, numbers, descriptions, prices etc.
We extract the information that is needed from unstructured text using AI.
We automatically extract your custom defined entities based on your domain to shorten waiting times of your customers.
3. Named Entity Recognition (NER)
An AI model is trained to extract custom defined entities. A dataset with labeled data has to be created. To do so, the text is extracted via OCR from the training documents. The labelling can then be performed in a tool that was developed by cloudflight in particular for the task of labelling texts and training NER models. With the final dataset, the model can be trained and then used for future predictions.
2. Optical Character Recognition (OCR)
Task of the OCR is to extract the text out of an image. A score is calculated for each word that represents the probability that it is extracted correctly. Additionally, a handwriting detection is applied to find documents with additional notes on them. Handwritten numbers in predefined fields are detected and recognized.
The preprocessing is adjusted to the type of document that is processed. Pictures of documents need a different kind of preprocessing than scanned documents. Tasks for the preprocessing are for example rotation and deskewing of the image, as well as improving the contrast and removing noise.
Documents are scanned and no longer have to be manually sorted by human beings and typed in their systems.
- Do you have to process documents and extract certain information?
- Do you receive many free-text orders every day?
- Do you have a huge amount of textual documents that noone ever reads because it’s too much?