Improve Verification with Large Language Model-based ID Information Extractor

The client’s Forex trading platform requires a strict verification process to prevent fraud and scams. As they were working with Rikkeisoft to develop the Forex trading platform, it is natural that Rikkeisoft would also work on resolving this crucial issue during development.

About the client

Artificial Intelligence

Python

BFSI

This article describes a real-life project. However, we cannot disclose our client’s name for privacy purposes.

The client is a Singapore-based start-up founded in 2017, mainly focusing on the Japanese market. Their primary goal is to assist brokers and traders of Forex, CFD, and securities through various solutions based on MetaTrader 4 and 5 platforms.

As the development of their Forex platform went on, the client discovered that a strict verification process would be required to prevent fraud and scams. This includes verifying the user’s name, address, ID numbers, date of birth, and face, as well as validating the ID’s legitimacy. All this would also have to be done automatically due to the platform’s large user base.

Project Overview

Industry

BFSI

Technology

Python, PyTorch, FastAPI, OpenCV, NumPy, Hugging Face, Docker

Country

Singapore

Duration

6 months

Challenges

Diversity of ID formats and languages

The clients’ user base is expansive, spanning multiple countries. This comes with a vast number of different ID document formats – close to 300 – and over 10 languages. These languages also come in many different forms – logosyllabic, syllabic, and alphabetic, among others. The ID document formats also constantly change as new formats are introduced, phasing out old ones in the process.

Unstandardized document photos

The client had to work with pictures of ID documents that would normally be impossible to process for computer systems. They might be crooked, skewed, or inappropriately sized, all due to users’ mistakes. This requires additional preparations before these documents can be used for verification.

Solution

OCR- and Large Language Model-based Information Extraction System

To assist the client in verification, Rikkei AI developed an information extraction tool that can recognize and pull information from pictures of ID documents. After receiving input from users (document photos), the tool will first classify the pictured document into “card” and “paper” categories based on the physical properties shown. It will then standardize the document if necessary by resizing and/or rotating the images.

After standardization, the tool uses Microsoft’s Azure Optical Character Recognition Cognitive Service to detect text fields in the document. These text fields then go through Rikkeisoft’s own fine-tuned Large Language Model (similar to that of OpenAI’s) for information extraction. This LLM is capable of dealing with over 100 languages, including Japanese, Chinese and Korean.

When the tool has finished with extraction, it continues to the verification step. Depending on the type of document, this is done through either photo or ID number verification.

documents types and formats are available

0 +

Capable of processing over 10 languages

0 +

Result

The client is highly satisfied with Rikkeisoft’s final solution, as it has helped them reduce the risk of fraudulent activities and scams, which are prevalent in the Finance and FinTech industry. Additionally, the automated nature of the tool ensured that the client would not need to have dedicated teams for the task, lessening any time and cost worries. The tool is also future-proof, simplifying the process of adding new ID formats for the client.

Overall, Rikkeisoft’s information extraction system has proven to be massively helpful in the client’s verification process. It has also further cemented Rikkeisoft’s capabilities in creating AI solutions for FinTech companies.