Go to Laboratory Home Go to Laboratory Home PageGo to Laboratory PhoneGo to Laboratory Search
Abstract

A system for automatically identifying the script used in a handwritten document image is described. The system was developed using a 496-document dataset representing six scripts, eight languages, and 281 writers. Documents were characterized by the mean, standard deviation, and skew of five connected component features. A linear discriminant analysis was used to classify new documents, and tested using writer-sensitive cross-validation. Classification accuracy averaged 88% across the six scripts. The same method, applied within the Roman subcorpus, discriminated English and German documents with 85% accuracy.

J. Hochberg, K. Bowers, M. Cannon, and P. Kelly. Script and Language Identification for Handwritten Document Images. International Journal on Document Analysis and Recognition. Volume 2, Number 2, pages 45-52, 1999. Los Alamos National Laboratory Technical Report LA-UR-98-5636.   [   Abstract   ]