We present a useful method for assessing the quality of a typewritten document image and automatically selecting an optimal restoration method based on that assessment. We use five quality measures that assess the severity of background speckle, touching characters, and broken characters. A linear classifier uses these measures to select a restoration method. On a 41-document corpus, our methodology improved the corpus OCR character accuracy by 24% and the word accuracy by 30%.
M. Cannon, J. Hochberg, and P. Kelly. Quality Assessment and Restoration of Typewritten Document Images. International Journal on Document Analysis and Recognition. Volume 2, Number 2, pages 80-89, 1999. Los Alamos National Laboratory Technical Report LA-UR-98-5336. [ Abstract ]






