So if I understand this correctly, people solve the little tests, jumbled letters, and now those solved tests are actually going to be whole digital books down the road?
As I understand it, they are scanning in old books, and then using the words that the scanner can’t read as test words. When lots of responses are the same for the word, then they know what the word was that the OCR scanner couldn’t read.
You're close. The little tests (CAPTCHAs) are how Web logins are handled now. This proposal would substitute images of scannable book text for the puzzles. Each scannable passage would be repeated to several different users, so their answers could be compared. Instead of having to have a computer scan the text, the logon users' answers would "vote" on the correct text corresponding to the image.
This technique would be used to input Gutenberg bibles and other texts that would be difficult to scan with conventional means.