NLP, LLMs, DeepML, and FastML: The AI Under the Hood of ABBYY Intelligent Document Processing
by Egor Budnikov, Product Director, NLP and Language Models
The narrative around artificial intelligence (AI) is shifting gears, swiftly transitioning from speculative hype to strategic implementation. This marks a critical phase where enterprises are meticulously selecting partners to steer their AI endeavors toward tangible success. The epiphany that AI's true potency in the business sphere stems from a diverse array of specialized techniques, rather than a singular solution, is resonating across corporate corridors. Echoing this sentiment, a Forbes study highlighted a critical gap: a staggering 90 percent of generative AI initiatives in 2023 stalled before scaling to production, often due to leveraging it as a point solution.
Against this backdrop, ABBYY distinguishes itself with a purpose-built AI platform for intelligent document processing (IDP). It delivers a robust blend of technologies such as generative AI and symbolic AI, designed with a focus on precision, consistency, and trust, to empower businesses to excel in the AI-driven landscape. In this article, we’ll explore a few of them.
Large language models (LLMs) and context injection
Even in today’s world, business processes still run on documents—as much as 90 percent of them. And as enterprises are beginning to leverage generative AI within their business processes, the need for intelligent document processing (IDP) increases to support RAG (retrieval augmented generation) and model fine-tuning initiatives. ABBYY supports enterprises to be successful with generative AI by transforming their data to structures and embeddings that are easy to leverage in Gen AI use cases. To get the accuracy and efficiency that enterprises want from an LLM, they need to have a knowledge base that is specific to their enterprise and business process.
Data often exists in silos within the organization, but largely, is locked away in business documents. Intelligent document processing extracts document data and gives it context. That data and context is then combined with the LLM user prompt to make it content-aware, meaning that it is specific to a given enterprise and the specific context in which they are leveraging the LLM before getting a result. Adding this context makes a huge difference in the accuracy and quality of the result, and it also mitigates, to some extent, some of the more nefarious effects that LLMs can have, such as hallucinations.
Convolutional neural networks and transformers
ABBYY’s end-to-end approach to OCR and ICR was solidified several years ago. Our approach uses the same technologies that ChatGPT and other LLMs are using—convolutional neural networks, transformers, and large language models. The LLM that ABBYY uses is very specific to the needs of our customers for extracting value out of their process-driving business documents.
The convolutional neural network breaks apart an image of handwritten or printed text on a document into its bits and bytes, trying to make sense of what it actually is. All that input from the CNN then goes into a transformer to provide a potential outcome of a word. Then, we introduce our very own LLM, which is trained on billions of parameters, with the specific function of being able to take the context of all of the different words in a group and make the best use of that info to come to a conclusion. This technique drastically improves the performance and accuracy of our OCR capabilities overall, and it is leveraged in combination with our statistical approach. Our AI will automatically decide which approach is best fit for your document use cases to optimize on the fly for consistency, accuracy, and speed, leading to better straight-through-processing rates.
Deep learning and fast machine learning (ML)
Deep learning allows us to pre-train AI models in our platform for a very specific purpose. This is very different from what you see with open-source LLMs or Gen AI providers that offer these as an API. While generative AI is creating amazing new capabilities in how we interact with technology, it won’t perform well with everything. ABBYY’s deep learning is trained for a very specific purpose, in which it excels.
As explained above, we employ a combination of many different technologies to deliver best-in-class intelligent document processing. In Vantage, we provide a combination of deep machine learning and fast machine learning to maximize the straight-through processing rate. With deep learning alone, our customers can get 90 percent accuracy right out of the box, with our pre-trained models. But with the inclusion of fast machine learning, that accuracy climbs above 95 percent. Fast machine learning will memorize the outliers that deep machine learning couldn’t get, and it works quickly, with just a few variations of the documents in question. And with the data we collect from that process, our deep learning continually improves to deliver higher and higher accuracy over time. Achieving 99 percent is not out of reach, as proven by the great results we see from working with FDA.
Natural language processing (NLP)
With our intelligent approach, ABBYY IDP is capable of processing documents of any type and complexity—whether fully structured forms or text-heavy, unstructured documents. Through natural language processing, Vantage can extract structured data out of flowing text, for example, from a contract. This can include so-called named entities, meaning names of persons, organizations, money amounts, dates, durations, locations, or addresses. The ability to extract such data from lengthy agreements or other unstructured documents can speed up and simplify various business processes, support knowledge workers to achieve greater efficiency, and provide better and timelier customer service.
Some examples of where NLP provides the greatest value include faster loan processing and approval by quickly extracting and validating borrower data across various loan origination documents; data privacy management and compliance by extracting all personal identifying information (PII) data from complex, unstructured documents with minimal effort; and simplified contract management, analysis, and risk assessment through efficient extraction of relevant names, dates, amounts etc., throughout the entire contract.
With its deep learning capabilities for NLP, Vantage allows developers and business users alike to train the system to identify their own named entities while at the same time providing full control and transparency into the training process.
AI innovation the foundation for ABBYY IDP
Cutting-edge AI is built into ABBYY’s IDP platform in all steps within the intelligent document processing pipeline, from image enhancement to object detection, OCR/ICR, classification, extraction from semi-structured documents, and unstructured documents.
- Image enhancement: Image source; document type detection; geometrical distortions; crop
- Object detection: Text-printed and handwritten; barcodes; checkmarks; stamps; signatures; faces; tables
- OCR/ICR: OCR for 200+ languages; ICR; document structure preservation
- Classification: Multimodal classification of document types; unsupervised clustering of documents by similar facets
- Extraction from semi-structured: DeepML, FastML; extraction rules; fixed forms
- NLP for unstructured: Segmentation; NER; DeepML; queries; summarization
Using the right combination of technologies and techniques, ABBYY IDP solutions can process any kind of document—any format, any language, any structure. All our specialized techniques have been optimized for the best possible inferences and the least amount of resources required so they can have optimal cost and deliver the best ROI for our customers.
Dive deeper into ABBYY’s approach to purpose-built AI for the enterprise on our AI Pulse podcast.