OCR software for scanning old utility bills - what works best?

Started by Randy Dawson — 2 years ago — 14 views
Working on a large historical audit going back to 2018, and the client only has paper bills for the early years. Need to digitize about 200 MLGW bills for data analysis. Anyone have experience with OCR software that handles utility bill formats well? The standard stuff seems to struggle with the tabular layouts.
Randy, I've had good luck with Adobe Acrobat Pro for OCR work. It's not perfect but handles most utility bill formats reasonably well. The key is scanning at high resolution first - at least 300 DPI. Then you can export to Excel and clean up the data. Oklahoma Gas & Electric bills work pretty well with this approach.
Have you looked into ABBYY FineReader? It's specifically designed for document OCR and seems to handle complex layouts better than general-purpose tools. I used it for some Idaho Power historical bills last year. More expensive than Adobe but might be worth it for a large project like yours.
Tom, I'll check out ABBYY. Cost isn't a major concern if it saves time on the data entry. How accurate was it with the numerical data? That's my biggest worry - getting usage and demand figures wrong because of OCR errors.
Pretty good on clear scans, maybe 95% accuracy on numerical fields. The bigger issue is formatting - it doesn't always preserve the relationship between labels and values. I ended up doing a manual review of every bill anyway, but it still saved probably 70% of the data entry time.
Another option is to reach out directly to MLGW. Sometimes utilities will provide historical billing data in electronic format for large audits, especially if there's potential for significant refunds. Worth asking before you spend weeks on OCR work.
Good suggestion Ramona. I'll contact their commercial billing department. Even if they charge a fee for the data extraction, it might be cheaper than the time investment. MLGW has been pretty cooperative on past audits.
If you do go the OCR route, make sure to spot-check the results against the original bills. I've seen situations where the software misreads rate schedule codes, which can throw off the entire analysis. Better to catch those errors early in the process.
Randy, following up on this thread - did MLGW provide the historical data electronically? I'm starting a similar project with MidAmerican Energy and wondering if I should try the direct approach first before investing in OCR software.