Did you know?

Every scanning workflow should include a cleanup phase
Why is optimizing scanned documents so important? Besides better readability and visual appearance of the files, there are other benefits for cleaning up scanned documents.
Any detection engine like OCR will provide better results on a clear document. It is also the case for recognizing barcodes ,checkboxes in exam forms, special fonts in checks, and any other element.
You also get better compression results on cleaned-up documents. Tools like hyper-compression ensure the best quality/readability ratio for your PDFs and sometimes even improve scanned documents' readability, thanks to many optimization algorithms.
Once your documents are cleaned up, you can compress and convert them to PDF/A for long-term archiving and preservation. People who will use your documents in the future will thank you for this!
Scanned documents can be noisy
Scanned documents quite often contain unwanted and randomly disseminated artifacts known as “noise.” In the imaging domain, we even have “salt and pepper noise,” which is bright pixels on darker areas and dark pixels on brighter image areas, as if someone poured salt and pepper particles over the document (imaging likes metaphors).
There are many filters to remove noise from a scanned document.
The Despeckle filter removes noise from images without blurring edges. It attempts to detect complex areas and leave these intact while smoothing areas where noise will be noticeable. Despeckle can clean up dirty or faded drawings that show spots or speckles after scanning.
The Median filter reduces noise in a layer by blending the brightness of pixels within a selection using an algorithm. The filter searches for pixels of similar brightness, discarding pixels that differ too much from adjacent pixels, and replaces the center pixel with the median brightness value of the searched pixels. It helps eliminate or reduce the appearance of motion in an image or undesirable patterns that may appear in a scanned image.
Median filtering particularly enhances OCR results because it removes noise but preserves edges.
There are many ways to enhance your scanned document
Skew is an artifact that might appear during the document scanning process, and it consists of getting the document’s text/images rotated at a slight angle. Most of the time, it occurs when the paper is misplaced in the scanner. Autodeskew is the process of detecting and fixing this issue on scanned files, so deskewed images will have the text/images correctly aligned.
This filter increases the rate of character recognition accuracy because the aligned text is much closer to what the OCR software is supposed to encounter when performing image analysis. Brightness and contrast are very well-known image adjustments and are particularly important for scanned documents because they can significantly improve readability.
We often forget about gamma correction, but changing gamma settings on a very light image will make it readable without darkening it. Its purpose is to optimize the contrast and brightness in the mid-tones while keeping the black and white elements.
A crop tool is useful when you need to cut out unwanted areas of a page. And if you need to remove black borders and punch holes, our clean-up widget will do it for you!