Blog: Transform your document processing in Under 5 Minutes

Posted by Craig Bannerman

Create New Document Templates in Under 5 Minutes (Line-by-Line Extraction vs UiPath Document Understanding) – Part of the Document Understanding Series

In this technical blog we will provide insight into the methods used by VKY Automation to transform document processing.

These methods apply to additional templates and template updates. The automation process learns as it runs and can be adjusted both to meet changing requirements and to take on more complexities, as employees’ confidence in the process grows.

Process Insights

Once a Document Extraction process is in place, it is guaranteed that there will be requirements for additional templates and also for updates to existing templates.

For example, these requirements could come about by the need to add a new supplier to your invoicing process, the need to add a new document type or a reformatted document that you currently process, and many more possibilities.

The Use Cases for utilizing ‘Line-By-Line Extraction’ are generally based around a small number of document templates like an ‘Internal Purchase Order Document’ that will always remain in the same format.

As these are internal, there may be a Document Versioning protocol in place to avoid an issue when a change is made to the Document format.

Line-By-Line extraction is the Digitization of a document (using OCR) and then Reading Line-By-Line. This process is similar to using Text Based Classifiers.

For example, to obtain the Currency Type – you find a relatable and unique text requirement like “Total:” and then format to get the desired result.

Figure 1.

var currencyLine = File.ReadLines(extractData).Last(e => e.StartsWith("Total:")).Trim();

var currency = currencyLine.Split(new []{' '}).Skip(2).FirstOrDefault();

out_currency = currency.Substring(0,3);

If you are seeking to apply the system to more than a few structured documents, it is advisable to follow the Document Understanding template process due to the evolution and scalability provided by this.

Creating Templates for a New Customer

  • Line By Line - This may take 2-4 hours and requires testing at least 10 different samples to ensure that no formatting on the page has changed. It should be noted that these samples should be from a variety of date periods to ensure no minor changes affect results.

  • Document Understanding – This takes just 5 minutes for a new customer. We add the fields to the Taxonomy that we want to extract, add some Keyword Based Classifiers, and use 1 sample to anchor the points where the extracted data is based. If the system is not confident that it's responding correctly, it will raise an action for Human Validation.

Updating Templates

  • Line by Line – If unable to extract the first field, the system will not attempt the remainder. At this point, a manual test is required to find out the issue. Updating the code to be able to extract from a sample containing an error could potentially cause a problem with the original sample.

  • Document Understanding – We move the selectors or create an additional template to handle the 'alternative' format of the document.

Work Involved

  • Line by Line – Involves creating paths for each and every supplier, which can result in very hard to maintain code. In addition, if a change is required, it would need to be replicated for each supplier (out with the extraction of data).

The chart in figure 2 shows what it would look like for five suppliers and one unrecognized route.

Figure 2. 

  • Document Understanding – There is a Taxonomy Manager, Validation Station & Classifier Configuration that keeps the process in a Linear Flow.

Validation

  • Line By Line – Additional checks are will be required for each document passed through, as the wrong line may have been picked up.

  • Document Understanding – A confidence level is created every time the document passes through, so the Human Validation becomes less frequent.   The human validation simply involves clicking 'Approve' in the UiPath Orchestrator when an 'Action' is raised. If the document is approved, it will have a higher confidence rating on the next run. The confidence rating can be set as a variable, so we can start with lots of human validation and lower it as the team become more satisfied that the robot is extracting the data correctly.

Exceptions

  • Line by Line – If it cannot extract the first field, it will not attempt the rest, requiring a manual test to uncover the issue. Updating the code to be able to extract from the sample with the error could then cause an issue with the original sample.

  • Document Understanding – The human-in-the-loop can raise the document as an Exception and only needs to provide a reason for the exception. The exception will be sent back to the automation software, and the path followed will depend on the exception handlers.

Results

By the application of these methods, VKY Automation has delivered thousands of human hours of time savings while also vastly improving efficiencies and job satisfaction, and all with a guarantee of overall cost savings.

Ready to find out what automation can do for you?  

Our automation is guaranteed to boost efficiencies, accelerate your innovation and streamline your processes in a cost-neutral way.   

With our Free Discovery Session we will help you identify processes to kick start or accelerate your automation journey with high volume, low complexity tasks that deliver results incredibly fast.    We are so confident your new automation activity will deliver value that we offer a guaranteed cost-neutrality so you see the benefits of your savings.    In most cases we deliver 300% vs cost

Interested?   Then book a phone call or meeting with one of our automation experts by clicking here