Free Republic
Browse · Search
News/Activism
Topics · Post Article

To: buzzer

OK - thanks for the reply. Unfortunately, while can see that the H was mistaken for an X, I have limited understanding on PDF binary files and still do not know if this gives weight to this document being a fraud or it if supports it as being authentic and credible.

Please excuse my ignorance, but if you can clarify that, I’d appreciate it. Thx so much !


92 posted on 04/27/2011 3:29:16 PM PDT by rocco55
[ Post Reply | Private Reply | To 89 | View Replies ]


To: rocco55

Usually the ocr engine decides which elements it will store as “image” and which it will store as “text”. Usually it tries to store as much as “text” as possible to make the document indexable/searchable. To do so it identifies “blocks” of content. Everything that doesn’t make it into one of the blocks remains on the background like the fancy security paper background. The blocks of content are then seperated further into smaller blocks. The OCR tries to recognize the letters. If it succedes it will replace the letter “image” with a “text character”. These content blocks now contain “text” and unrecognized “images”. This content is stored into the content layer which is a overlay of the background layer.


108 posted on 04/27/2011 10:36:50 PM PDT by buzzer
[ Post Reply | Private Reply | To 92 | View Replies ]

Free Republic
Browse · Search
News/Activism
Topics · Post Article


FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson