Replies

OK - thanks for the reply. Unfortunately, while can see that the H was mistaken for an X, I have limited understanding on PDF binary files and still do not know if this gives weight to this document being a fraud or it if supports it as being authentic and credible.

Please excuse my ignorance, but if you can clarify that, I’d appreciate it. Thx so much !

Usually the ocr engine decides which elements it will store as “image” and which it will store as “text”. Usually it tries to store as much as “text” as possible to make the document indexable/searchable. To do so it identifies “blocks” of content. Everything that doesn’t make it into one of the blocks remains on the background like the fancy security paper background. The blocks of content are then seperated further into smaller blocks. The OCR tries to recognize the letters. If it succedes it will replace the letter “image” with a “text character”. These content blocks now contain “text” and unrecognized “images”. This content is stored into the content layer which is a overlay of the background layer.