OCR (Optical Character Recognition) technology is evolving, despite the COVID-19 crisis. Leading OCR companies, like Opentext, continue investments into the technology. If the trend for digital transformation will extend with the crisis – technology that allows us to digitize handwritten or printed text will be demanded. For instance, OCR is widely used for business process automation.
Optical character recognition technology has its roots back to the ‘90s. You may remember those old fashioned scanners and slow-working software to transfer images into recognized text.
In the era of Artificial Intelligence, OCR technology has increased its quality, speed, and use cases. It’s not about recognizing printed texts anymore, but hand-written documents, road signs, characters on pictures in the wildcare plates, IDs data, and much more.
Depending on specific use cases, the approach for OCR implementation will be different. In general, there’s a pre-processing stage to “clean” and prepare the image. Then a trained neural network is applied to “read” the picture and extract the text. There’s also supporting specialized neural networks that help to correct mistakes and improve resulting quality.
Neural networks could be designed and trained “from scratch.” It’s often a scientific task, and plenty of data is needed. More often, some already pre-trained neural networks to be used as a base. Then a final training will be done with much fewer data and efforts. If the case is typical, like printed text recognition, there are ready to use solutions and services available. It is probably an excellent way to go if no customization is needed.
Top Companies: List of the leading Custom Software Developers
This article highlights OCR technology implementation for an automated identity verification solution by MobiDev. The system for matching the photos with a database of official identification documents such as driver’s licenses.
Face embeddings or facial features that have been extracted from drivers’ licenses then matched with a photo and for this Machine learning face recognition technologies were applied.
Along with the photo, additional information like name, surname, DOB, and other data from a driver’s license had to be recognized. That’s where the OCR was used. There were cases where the Front side information needed to be matched with back-side bar code data.
OCR solutions seem capable of reading pictures and processing relevant information, however, most solutions offered by off-the-shelf products proved to be ineffective for this task. In short, it is recommended that decision-makers run real scenario tests before implementation. Our study employed Optical Character Recognition SDKs and APIs. Here are some points that should be taken into account before choosing the OCR Solution.
Driver’s license and other official ID photos often position their subjects in formal poses while selfie shots are casual. In addition, the quality of government-issued IDs is frequently lower than most photos. Quality differences are unavoidable as smartphone cameras vary.
OCR is sensitive to input data quality and its unification. The number of false rejections is difficult to follow through as person identification depends on it. Having numerous false rejections (when we can’t match a person with the ID) affects the product and leads to poor UX. So, it’s a balance between secure user identification and smooth UX.
For the system in general, improving the quality of selfie submissions, both in terms of positioning and photo uploads will contribute positively to user experiences. Users can be prompted by simple instructions during the picture-taking process, and this will result in improved quality and a decline in false rejections.
Standardizing user-submitted photos will make it possible to achieve a balance between positive UX experiences and high-security standards. When choosing an OCR system, decision-makers should consider the raw data provided to the system. In other words – how will those pictures that your users will provide you look like? Most computer vision-based OCR solutions magically yield results, but understanding what goes into the “black box” is a must.
For instance, driver’s license offices in the United States lack uniformity which presents a significant challenge to OCR implementation. Photo quality varies from county to county, and every state has a unique driver’s license format that is frequently changed. This makes it difficult to create templates for OCR processing and license parsing.
Our first step was to bring a small data set of 150 IDs and 100 driver’s licenses. The team gathered data from open data sets and internal data sets. The data set had to be close to what users will provide to the system later.
Having the data brought us a green light to compare and evaluate open-source and commercial OCR data-parsing tools. A short-list of OCR systems best-suited for project goals was generated.
Ultimately, it was found that Google Vision AI was a sound choice for the project. Teams tracked metrics that rated the accuracy of DOB, name, and surname matches.
Some organizations rely on large data sets or open databases containing book scans or images containing text. But the team working on the project were determined that a study of driver’s licenses and images did not require a large data set. The decision related to study-specific data challenges: variable driver’s license templates, low-quality photographs as well as constraints presented by OCR systems. Ability to adjust OCR parameters – it’s another key point to be taken into account. Most of OCR’s could not be adjusted or customized to meet product needs.
Parsing actual driver’s licenses provided the truest indicator of an OCR solution’s fitness. The study concludes that the most successful dataset was one prioritized for the task and aligned with the target audience’s expectations.
Ultimately, it was found that hybrid solutions were required. The case study concludes that security compliant OCR solutions were reached when DS, ML, and software engineering teams collaborated on secure data collection processes.
This isn’t to say that the study’s researchers experienced seamless OCR engine implementation. Rather, their biggest challenge was cross-referencing raw Google Vision data with barcode-delivered data at 100% accuracy. Despite Google Vision’s sophistication, the engine still made mistakes. In certain situations mistakes are negligible, however, when it comes to security protocols errors are unacceptable. As a result, the need for additional security layers to prevent fraudulent attempts was noticed. This reveals one limitation of OCR engines.
Machine Readable Zone (MRZ) is a detailed data field found on travel documents. The field can contain a letter or a number. The MRZ field also contains digits that ensure data parsing accuracy.
It was found that Google Vision’s accuracy value, when applied to MRZ recognition, was well below 100%. As a result, the team found it necessary to employ additional data cross-checking processes.
After evaluating the Optical Character Recognition (OCR) system’s data parsing accuracy, the team found patterns within errors. Similar symbols such as ‘1, l, I, I’ and ‘O, D, Q, 0’ were frequently mismatched. The symbols’ physical similarity was likely responsible for the engine’s misclassification. So, additional algorithms were brought for cross-checking those varieties.
Based on the specific business use cases you’ll add plenty of your own agenda.
Oleksii Tsymbal enjoys Rock concerts. As they all are off - he is rocking & working as Chief Innovation Officer at MobiDev. Machine learning consulting is one of his core focuses. He believes - product owners have to drive business with vision and passion. And his role is to support and make product delivery happen, no matter what!