And BTW, why use a machine that used OCR when you want a mirror image of the document a bit by bit copy. There was no need to determine what character is what. If it is a “6” or is an “8” ? You would not. You want a picture a photograph like image.
These clowns are blowing smoke up our butts.
“...why use a machine that used OCR when you want a mirror image of the document a bit by bit copy. There was no need to determine what character is what. If it is a 6 or is an 8 ? You would not. You want a picture a photograph like image.”
NBC is not claiming that OCR recognized a “6” or an “8” as a number. NBC contends that the Xerox copier uses an international JPEG standard software called “Mixed Raster Compression” incorporating an algorithm called “image segmentation” to identify borders and shapes. If a shape is repeated closely, then the first one is copied rather then saving a second or subsequent shape that is only slightly different.
http://en.wikipedia.org/wiki/Mixed_raster_content
“Mixed raster content, or MRC, is a method for compressing images that contain both binary text and continuous-tone components, using image segmentation methods to improve the level of compression and the quality of the rendered image.[1] By separating the image into components with different levels of compressability, the most efficient and accurate compression algorithms for each type can be used.”
http://en.wikipedia.org/wiki/Segmentation_(image_processing)
“Compression based methods postulate that the optimal segmentation is the one that minimizes, over all possible segmentations, the coding length of the data.[5][6] The connection between these two concepts is that segmentation tries to find patterns in an image and any regularity in the image can be used to compress it.”
So when the compressed JPEG is opened in Preview on a Mac and then saved as a pdf file, instead of four similar lower case letter “e” images appearing in the LFBC, there is only one “e” repeated four times, IIRC. If you read the NBC blog there is a lot of explanation of this.