As I understand it, they are scanning in old books, and then using the words that the scanner can’t read as test words. When lots of responses are the same for the word, then they know what the word was that the OCR scanner couldn’t read.
Thanks for the clue cause I had none! Now are these unknown words going to be added to various sites to enhance what the OCR can’t read now?