AI-powered ID Information
Extraction for a Forex Trading
Platform
About the Client
This article describes a real-life project. However, we cannot disclose our client’s name & the project details for privacy purposes.
Our client is a fintech startup operating a forex trading platform. While headquartered in Singapore, the company has Japanese roots and primarily serves users across Asia. Operating in the financial sector, the client faces strict compliance requirements for user verification (KYC/AML) and needs a secure, scalable, and multilingual solution to streamline its onboarding process.
Project Overview
The project focuses on developing an OCR-based system capable of
automatically extracting personal information from ID documents.
Industry
Financial Services
Technology
Computer Vision; Small Language Model (Generative AI based); Azure Cognitive Service
Country
Japan
Timeline
03/2023 - 08/2023
Challenges
The client faces several key challenges in their onboarding and KYC processes:
Diverse document formats
The platform needs to validate more than 300 types of ID documents, with formats that frequently change depending on the issuing country.
Multilingual complexity
Serving users across Asia requires robust support for multiple languages, including Japanese, English, Chinese, Korean, Vietnamese, Malay, and Filipino.
Poor image quality
User-uploaded documents are often tilted, blurry, or poorly cropped, making it difficult to achieve accurate OCR results.
High operational costs
Manual data entry and verification through BPO services are costly, time-consuming, and prone to human error.
Solution
RikkeiSoft delivers an AI-powered ID Information Extractor that integrates Computer Vision, OCR, and Natural Language Processing into one streamlined workflow:
Computer Vision Pre-processing
The system automatically rotates, resizes, and enhances ID images to ensure clarity and consistency, which improves downstream recognition accuracy.
Text Detection & OCR
Advanced text region detection identifies relevant areas of the ID card, while Azure OCR services securely extract the raw text from these regions.
Small Language Model (Generative AI-based)
A fine-tuned small language model parses OCR outputs and accurately extracts structured personal data fields such as full name, date of birth, nationality, and ID number. This step reduces errors caused by variations in ID formats.
Validation & Error Handling
The extractor applies rule-based validation (e.g., date format checks, numeric field verification) and confidence scoring to flag uncertain cases for manual review, ensuring both speed and reliability.
Scalable & Secure Design
The solution is architected to scale seamlessly for different ID formats across new markets, while maintaining strict compliance with financial data privacy and security standards.
Results
80%
Time Reduction
70%
Cost Reduction
The implementation delivers measurable improvements in both efficiency and customer experience. The system now supports over 300 types of ID documents across multiple Asian countries and provides robust multilingual recognition. Document processing time reduces by 80%, significantly accelerating customer onboarding, while operational costs fall by 70% thanks to the elimination of manual data entry. Most importantly, the client achieves a more secure, reliable, and user-friendly verification process, strengthening customer trust and improving overall satisfaction.