The Problem
A financial data provider based in Bulgaria set an ambitious goal to build the largest comprehensive database of company financials in the region. However, various factors threatened this vision. Several OCR vendors failed to meet their accuracy and scalability requirements, trapping 40% of important financial data in subpar documents.
Main Technical Challenges
Inaccurate Information Extraction
Current solutions failed to extract data accurately
Inadequate PDFs
Poor document quality reduced extraction effectiveness
Dispersed Financial Data
Data spread across various statement types with different representations
Manual Intervention
Adjustments needed for missing data points
Document Rejection
System rejected documents that didn't adhere to templates, causing silent data loss and hindering database growth
The client's current system created a significant bottleneck by labeling any document with missing information as "data deficient" and outright rejecting it. Additionally, the client's system was not built to handle files with insufficient data. As a result, the client discovered significant information gaps that they had not previously recognized. In accordance with their standard procedure for manual processing, the client requested that we overcome this obstacle and continue to fill the template with derived data.
Our Solution
As a POC, we worked on around 50 documents with varying difficulty levels. Our financial data extraction agent extracted the information with 95% accuracy, which the client found to be highly impressive. This led to scaled-up delivery, processing 1,000 documents in batches.
Solution Process Flow
Document Ingestion
Process various document formats
AI Processing
Extract and verify financial data
Data Reconstruction
Fill missing data points
Database Integration
Structured data storage
Getting Past Implementation Obstacles
Addressing the client's requirement for template adherence resulted in the biggest breakthrough. We created a complex workaround that significantly altered the client team's data processing capabilities in close collaboration with our in-house subject matter experts.
Reconstructing Data Intelligently
To guarantee that every document satisfies the precise template structure needed by the client's system, we developed automated systems that find and recreate missing data points.
Advanced Document Processing
We applied AI-driven features specifically designed to manage low-quality documents while maintaining 95% data quality standards across previously inaccessible data sources.
Intelligent Mapping System
We created systems that use rules to pull out information, making sure that different words for the same data are treated the same, unifying different ways of showing data.
Automated Value Derivation
New systems were created to fill in missing information by automatically changing and calculating values from notes and indirect references while keeping the data consistent.
Business Impact
Information coverage increased
Data accuracy increased
Processing time reduction
The system eliminated almost all of the data gaps that previously restricted the database's coverage width, increasing coverage from 60% to 99%.
Increased language coverage from 8–10 languages to 40+ languages, enabling comprehensive regional coverage and establishing the client as the go-to source for financial information.
Enhanced data accuracy from 70–75% to 98%, providing a solid basis for financial analysis and decision-making.
By removing silent data loss and unlocking previously inaccessible information, we successfully built the largest comprehensive financial database in the region.
Strategic Business Results
Through this collaboration, the client's standing was improved from that of a regional player with data gaps to that of the leading source of financial data for all of their target markets. We helped them realize their ambitious goal and set new benchmarks for data quality and coverage in the financial information industry by resolving the template adherence issue and putting intelligent data reconstruction into practice.