Free Republic
Browse · Search
News/Activism
Topics · Post Article

Skip to comments.

Researchers create tool to automatically search handwritten historical documents
University of Massachusetts, Amherst ^ | November 29, 2004 | Office of News & information

Posted on 12/04/2004 5:46:21 AM PST by ckilmer

Researchers create tool to automatically search handwritten historical documents

Historians and researchers searching through handwritten documents, such as the 140,000 pages that make up George Washington’s personal papers in the Library of Congress, now have a new powerful tool to aid their work – a first-of-its kind manuscript retrieval system developed at the University of Massachusetts Amherst. The search tool has been developed by the Center for Intelligent Information Retrieval in the computer science department at UMass Amherst.

R. Manmatha, research assistant professor of computer science, along with graduate students Toni Rath and Victor Lavrenko, have created a demonstration of their search tool using 1,000 scanned pages of Washington’s manuscripts. Manmatha says the computer interface is similar to the popular computer search engine Google.

The scanned pages of Washington’s papers can be searched by typing in a word such as “Washington” or “Virginia,” and the program produces a list of ranked pages showing where they appear.

Manmatha says, “Right now, searching a scanned handwritten document is very hard to do. Scanned historical documents are basically images, or pictures, and currently can only be searched if someone manually transcribes the documents or creates and index of their contents. This is time consuming and expensive to do. Given the cost, most handwritten documents are never transcribed or indexed,” Manmatha says. “But there is an enormous amount of handwritten, historical material.

According to Toni Rath, “The basic idea is analogous to searching text documents in one language, say French, using queries in another language, say English. This is usually done by learning models from documents written in both languages. By analogy, our system learns from a parallel body of transcribed scanned images. That is, the word images form a ‘visual language’ and the transcriptions are in English.” Once the model is learned it may be used for searching scanned pages for which no transcriptions are available.

A research paper describing the work was presented this summer at the leading information retrieval conference – the 27th Annual International ACM SIGIR conference in Sheffield, England. The work is partly funded by a grant from the National Science Foundation and the National Endowment for the Humanities.

More: Handwriting Retrieval Demonstration

November 29, 2004. Office of News & information


TOPICS: Constitution/Conservatism; Culture/Society
KEYWORDS: automatic; documents; handwritten; search

1 posted on 12/04/2004 5:46:22 AM PST by ckilmer
[ Post Reply | Private Reply | View Replies]

To: RightWhale

Perhaps this will be useful in determining the content of my own handwritten notes.


2 posted on 12/04/2004 5:51:19 AM PST by aposiopetic
[ Post Reply | Private Reply | To 1 | View Replies]

To: ckilmer

I played around with the site for a half dozen or so searches - each hit had absolutely nothing to do with the search term. I think they need to spend a bit more time on it.


3 posted on 12/04/2004 6:01:46 AM PST by rotstan
[ Post Reply | Private Reply | To 1 | View Replies]

To: ckilmer; blam

Thanks. GGG alert.


4 posted on 12/04/2004 7:06:00 AM PST by wizr (Love. Take some, pass it on. John 3:16)
[ Post Reply | Private Reply | To 1 | View Replies]

To: wizr

BTTT


5 posted on 12/04/2004 11:19:59 AM PST by blam
[ Post Reply | Private Reply | To 4 | View Replies]

To: aposiopetic

One of the few inspired pieces of computer software, askSam, is part of the solution. It doesn't read handwriting or hear speech, but there are other tools to do that.


6 posted on 12/04/2004 2:45:25 PM PST by RightWhale (Destroy the dark; restore the light)
[ Post Reply | Private Reply | To 2 | View Replies]

Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.

Free Republic
Browse · Search
News/Activism
Topics · Post Article

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson