Free Republic
Browse · Search
Bloggers & Personal
Topics · Post Article

To: Seizethecarp

And BTW, why use a machine that used OCR when you want a mirror image of the document a bit by bit copy. There was no need to determine what character is what. If it is a “6” or is an “8” ? You would not. You want a picture a photograph like image.

These clowns are blowing smoke up our butts.


161 posted on 08/10/2013 7:32:59 PM PDT by Red Steel
[ Post Reply | Private Reply | To 159 | View Replies ]


To: Red Steel

“...why use a machine that used OCR when you want a mirror image of the document a bit by bit copy. There was no need to determine what character is what. If it is a “6” or is an “8” ? You would not. You want a picture a photograph like image.”

NBC is not claiming that OCR recognized a “6” or an “8” as a number. NBC contends that the Xerox copier uses an international JPEG standard software called “Mixed Raster Compression” incorporating an algorithm called “image segmentation” to identify borders and shapes. If a shape is repeated closely, then the first one is copied rather then saving a second or subsequent shape that is only slightly different.

http://en.wikipedia.org/wiki/Mixed_raster_content

“Mixed raster content, or MRC, is a method for compressing images that contain both binary text and continuous-tone components, using image segmentation methods to improve the level of compression and the quality of the rendered image.[1] By separating the image into components with different levels of compressability, the most efficient and accurate compression algorithms for each type can be used.”

http://en.wikipedia.org/wiki/Segmentation_(image_processing)

“Compression based methods postulate that the optimal segmentation is the one that minimizes, over all possible segmentations, the coding length of the data.[5][6] The connection between these two concepts is that segmentation tries to find patterns in an image and any regularity in the image can be used to compress it.”

So when the compressed JPEG is opened in Preview on a Mac and then saved as a pdf file, instead of four similar lower case letter “e” images appearing in the LFBC, there is only one “e” repeated four times, IIRC. If you read the NBC blog there is a lot of explanation of this.


164 posted on 08/10/2013 8:52:20 PM PDT by Seizethecarp (Defend aircraft from "runway kill zone" mini-drone helicopter swarm attacks: www.runwaykillzone.com)
[ Post Reply | Private Reply | To 161 | View Replies ]

Free Republic
Browse · Search
Bloggers & Personal
Topics · Post Article


FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson