Go to Laboratory Home Go to Laboratory Home PageGo to Laboratory PhoneGo to Laboratory Search
Abstract

We present a practical method for improving the OCR accuracy of degraded typewritten document images. Our method is based on a judicious selection of a restoration algorithm for each document that is to be processed. The selection is based on a comprehensive assessment of the quality of the document. The assessment quantifies the severity of a variety of document degradations, such as background speckle, touching characters, and broken characters. A statistical classifier then uses these measures to select an optimal restoration method for the document at hand. On a 41-document corpus, our methodology improved the corpus OCR character accuracy by 24% and the word accuracy by 30%.

M. Cannon, J. Hochberg, and P. Kelly. QUARC: A Remarkably Effective Method for Increasing the OCR Accuracy of Degraded Typewritten Documents. In Proceedings of the 1999 Symposium on Document Image Understanding Technology, pp. 154-158, Annapolis, MD, May 1999. Los Alamos National Laboratory Technical Report LA-UR-99-1233.   [   Abstract   |   PostScript (687 KB)   |   PDF (67 KB)   ]