Posted on 12/04/2004 5:46:21 AM PST by ckilmer
Researchers create tool to automatically search handwritten historical documents
Historians and researchers searching through handwritten documents, such as the 140,000 pages that make up George Washingtons personal papers in the Library of Congress, now have a new powerful tool to aid their work a first-of-its kind manuscript retrieval system developed at the University of Massachusetts Amherst. The search tool has been developed by the Center for Intelligent Information Retrieval in the computer science department at UMass Amherst.
R. Manmatha, research assistant professor of computer science, along with graduate students Toni Rath and Victor Lavrenko, have created a demonstration of their search tool using 1,000 scanned pages of Washingtons manuscripts. Manmatha says the computer interface is similar to the popular computer search engine Google.
The scanned pages of Washingtons papers can be searched by typing in a word such as Washington or Virginia, and the program produces a list of ranked pages showing where they appear.
Manmatha says, Right now, searching a scanned handwritten document is very hard to do. Scanned historical documents are basically images, or pictures, and currently can only be searched if someone manually transcribes the documents or creates and index of their contents. This is time consuming and expensive to do. Given the cost, most handwritten documents are never transcribed or indexed, Manmatha says. But there is an enormous amount of handwritten, historical material.
According to Toni Rath, The basic idea is analogous to searching text documents in one language, say French, using queries in another language, say English. This is usually done by learning models from documents written in both languages. By analogy, our system learns from a parallel body of transcribed scanned images. That is, the word images form a visual language and the transcriptions are in English. Once the model is learned it may be used for searching scanned pages for which no transcriptions are available.
A research paper describing the work was presented this summer at the leading information retrieval conference the 27th Annual International ACM SIGIR conference in Sheffield, England. The work is partly funded by a grant from the National Science Foundation and the National Endowment for the Humanities.
More: Handwriting Retrieval Demonstration
November 29, 2004. Office of News & information
Perhaps this will be useful in determining the content of my own handwritten notes.
I played around with the site for a half dozen or so searches - each hit had absolutely nothing to do with the search term. I think they need to spend a bit more time on it.
Thanks. GGG alert.
BTTT
One of the few inspired pieces of computer software, askSam, is part of the solution. It doesn't read handwriting or hear speech, but there are other tools to do that.
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.