Document understanding: Modern techniques and real-world applications

Documents are at the center of many business processes. Scanned pages and PDFs are ubiquitous and contain large amounts of information represented as forms and tables.

Historically, this information could only be analysed and used following manual data re-entry—the process which is slow and prone to error—as traditional optical character recognition (OCR) systems haven't been able to analyse such data and preserve its inherent structure in their output.

Document understanding is concerned with advancing the abilities of document intelligence by supporting the retrieval of structured data in addition to simple text. A process that heavily relies on machine learning, it has proven key to automating structured data extraction and unlocking its full potential by making it readily accessible for subsequent processing and analysis.

How document understanding works

"Understanding" a document first involves detecting its layout and key elements such as figures, tables, and forms. These elements are then processed separately to extract the underlying data relationships.

Any embedded forms are parsed into sets of key-value pairs, each pair corresponding to a single form field. An example of a key-value pair is "First name"–"Alice". The sets of such linked data items can subsequently be inserted into a database, one row or document per form.

Document understanding products and services

The easiest way to incorporate document understanding into production workflows is to use existing cloud services. Major cloud providers each offer multiple machine learning-based services which include text and document intelligence. These offerings are summarised in the following table:

ServiceProviderDescription
Amazon TextractAmazon Web ServicesAmazon Textract parses form data and tables. The service is integrated with Amazon Augmented AI (Amazon A2I) for implementing human review.
Document AIGoogle CloudPreviously known as Document Understanding AI, Document AI is capable of parsing forms, tables, and invoice content (the invoice feature is only available for approved customers).
Form RecognizerMicrosoft AzureForm Recognizer extracts tables and key-value form pairs from documents and offers prebuilt models for analysing receipts and business cards.

Industry use cases

Document understanding is a key component of various emerging practical workflows and applications.

An example application of document understanding is invoice processing. Invoices are commonly sent as PDFs or paper documents that can be formatted in different ways but generally contain the same type of information such as invoice date, amount due, payment terms, etc. By being able to automatically recognise and extract this information, cognitive invoicing systems facilitate invoice processing and reduce the associated costs.

The bottom line

By automating manual document activities, document understanding enables organisations to process documents more efficiently, reduce error, and bring down costs. By helping extract the valuable information stored inside scanned and digital documents, it assists in search and discovery and compliance control for these documents.

The extracted structured data can be ingested by various downstream business applications, enabling smarter workflows and more advanced processing at scale.

See also

Made by Anton Vasetenkov.

If you want to say hi, you can reach me on LinkedIn or via email. If you like my work, you can support me by buying me a coffee.