Return to Digital Photography Articles

How to Increase the JPEG Quality from your Scanner!

Many photographers still find themselves faced with a need to scan hundreds of their old photos. While some scanner drivers output bitmaps (.BMP) or JPEGs, not all utilities provide the ability to select the level of compression / quality of the saved JPEG images. This article shows you how to increase the JPEG quality output from your scanner!

Introduction

The task of scanning photos can be a laborious one. Many of us digital photographers have stacks of old photo albums that we hope to get around to scanning someday into our digital asset management software (DAM). But, just like trying to decide what quality to rip your CDs into MP3s, you must decide what quality to use when saving the scanned photos.

Choosing a poor image quality means that your collection is only preserved (through backups) at a relatively low quality. When it comes time to reprint from your scans, you may wish you had saved with better quality when you spend the time scanning!

On the other hand, most scanner drivers provide the option to save as Windows Bitmaps (.bmp). This would be the highest quality, but the file format does not use any compression. So you end up wasting nearly ten times as much storage space as is really necessary.

Before Scanner ModAfter Scanner Mod
Notice the significant reduction in JPEG blocking artifacts and ringing noise. My technique successfully modified the scanner utility to deliver much better JPEG image quality output! In my case, I raised the quality factor from 70 to 97.

Other Uses for this Technique

The following technique has been used by me to substantially improve the quality of JPEG output from my scanner. The same technique (combined with my Quantization Table listings) have also been used by others to increase the quality of images from their camera phones!

What output options does your scanner driver provide?

Nearly all scanner control panels will provide BMP output, JPEG output and, indirectly, a TWAIN driver output.

  • JPEG - For natural photos, JPEG is really the best file format to use because of its extremely effective compression techniques. However, through lossy compression comes a slight reduction in quality over uncompressed / non-lossy techniques. As a photographer, you'll have to make the tradeoff decision.
  • BMP / TIFF - Lacking an efficient lossy compression scheme, these file formats will consume huge amounts of disk space. Therefore, they are best used when the utmost quality must be extracted from the scanner. For archiving large collections of snapshots, this is not usually the best choice.
  • TWAIN Driver - The scanner utility also generally offers the ability to import scanner data directly into the image editor of your choice (such as Photoshop). For individual photo scans, this may be the best choice (as you can work in Photoshop's 16-bit mode), but it is largely unsuitable for batch scanning (i.e. scanning large collections).

Now that you've decided to use your built-in scanner toolbox / utility to save out to the JPEG file format, you need to consider what quality options are available.

An assortment of scanners

I have gone through 5 different scanner models from Canon, HP and other manufacturers over the years. I started out with a bulky flatbed, then went on to a conveniently-thin LiDE model. Discovering some of the color limitations of LiDE models, I then made a choice to return to the bulkier CCD-based flatbed models. While I prefered the scanner control panel flexibility provided by some other manufacturers, I prefer the actual scan quality from my current hardware choice.

The most useful feature for Bulk Scanning of Photos

One feature provided by many of these scanner control panels is the called Auto-Crop (or Multi-Crop). It allows you to place multiple photos on the flatbed's platen, press a button, and the individual photos are automatically identified and cropped out into their own files. With a standard-sized scanner, I can get 3 4x6's cropped reliably, and occasionally a fourth.

The time savings afforded with this feature outweigh the benefits I've seen with other methods.

What, no JPEG quality settings??

As much as I liked my scanner, I discovered very quickly that the automated methods provided no option to set the JPEG compression quality level!

This is almost unbelievable, especially on one of the top scanners available from this manufacturer. I desire the time-savings of the Auto-Crop automation, combined with reasonable image quality output, while not wasting significant file space with inefficient file formats.

I contacted Technical Support and was told that there is no way to control the JPEG compression quality used.

Warning! Warning!

The following article involves hacking your scanner utility software. This is provided for entertainment purposes only. You will need to read the license agreement of your scanner software to ensure that modifying it does not violate any terms. In light of this, I am not providing filenames and file offset values in this tutorial.

That said, this is also quite a complicated process and should only be done by those who are relatively comfortable with the topics covered herein (JPEG compression, quantization tables / quality and hex editors).

How I modified the Scanner's JPEG Compression Quality!

Not happy hearing that there was no way to improve my wonderful scanner's output, and armed with a reasonable understanding of JPEG compression, I set out to dig a little deeper.

Easier Method Alert!

With the introduction of JPEGsnoop v1.0.0, you can now automatically locate most DQT tables with the Search Executable for DQT option! For a brief introduction to this option, please see the Interesting Uses page. The steps shown below were done prior to the release of this new time-saving feature.

Step 1 - What amount of compression is used?

  • Scan a photo and save out to a JPEG file
  • Open and analyze the image with JPEGsnoop
  • Extract the Quantization Table (listed under the DQT Heading) for both Luminance and Chrominance
    JPEGsnoop screenshot
    Take special note of the Approximate Quality Factor. Is it the same for both Luminance and Chrominance?
  • Look at the AnnexRatio section -- are the numbers nearly all the same?
    JPEGsnoop screenshot

    In this case, I see a strong trend suggesting that the scanner driver is saving JPEG images with a quantization table that is based on a linear multiple of the JPEG Standard's Annex K values. This is very common among software tools and even some digital cameras.

If we have confirmed that both the Luminance and Chrominance quantization tables have a strong correlation with the JPEG Standard suggested tables (as determined by both: consistent AnnexRatio values and similar Quality Factor value for both tables), then we can presume that the actual quantization table used in the software is probably calculated dynamically (at run-time) from the suggested tables in the JPEG Standard Annex K.

Since the tables are calculated dynamically (as opposed to hard-coded), it makes my work a fair bit harder.

If it turns out that your tables don't seem to match the Standard tables, then you might have a far easier time with the modification. Instead of trying to reverse engineer the table generation (steps 4 and 5), you can search directly for the output table and modify it accordingly (no use of the formula).

Step 2 - Calculate the table in hexadecimal

Later steps will require us to search for an executable for a give-away hexadecimal string. In this step we will calculate the sequence.

Look at the JPEG standard's quantization table, and convert the values to hex (if you're really stuck, you can use an online hex calculator. The hex table values may be represented inside the software with either 1, 2 or 4 bytes per number. I decided to start my search for 2-byte values.

Size (bytes)C Data Type Example Total DQT Table Size (bytes)
1unsigned byte5C64 bytes
2unsigned short5C 00128 bytes
4unsigned int5C 00 00 00256 bytes

NOTE: The above table assumes a little-endian notation, which is the most likely arrangement for multi-byte numbers stored on Windows PCs.

The following conversion of the JPEG Standard's [Annex K] luminance table into hex assumes 2-byte values (I might have had to redo the same process with 1- or 4-byte integers instead if my 2-byte search came up empty).

Standard DQT Luminance (decimal)   Standard DQT Luminance (hex short)
 16  11  10  16  24  40  51  61
 12  12  14  19  26  58  60  55
 14  13  16  24  40  57  69  56
 14  17  22  29  51  87  80  62
 18  22  37  56  68 109 103  77
 24  35  55  64  81 104 113  92
 49  64  78  87 103 121 120 101
 72  92  95  98 112 100 103  99
 
10 00 0B 00 0C 00 0E 00 0C 00 0A 00 10 00 0E 00
0D 00 0E 00 12 00 11 00 10 00 13 00 18 00 28 00
1A 00 18 00 16 00 16 00 18 00 31 00 23 00 25 00 
1D 00 28 00 3A 00 33 00 3D 00 3C 00 39 00 33 00 
38 00 37 00 40 00 48 00 5C 00 4E 00 40 00 44 00 
57 00 45 00 37 00 38 00 50 00 6D 00 51 00 57 00 
5F 00 62 00 67 00 68 00 67 00 3E 00 4D 00 71 00 
79 00 70 00 64 00 78 00 5C 00 65 00 67 00 63 00 

Now, the same done with the chrominance table...

Standard DQT Chrominance (decimal)   Standard DQT Chrominance (hex short)
 17  18  24  47  99  99  99  99
 18  21  26  66  99  99  99  99
 24  26  56  99  99  99  99  99
 47  66  99  99  99  99  99  99
 99  99  99  99  99  99  99  99
 99  99  99  99  99  99  99  99
 99  99  99  99  99  99  99  99
 99  99  99  99  99  99  99  99
 
11 00 12 00 12 00 18 00 15 00 18 00 2F 00 1A 00 
1A 00 2F 00 63 00 42 00 38 00 42 00 63 00 63 00 
63 00 63 00 63 00 63 00 63 00 63 00 63 00 63 00 
63 00 63 00 63 00 63 00 63 00 63 00 63 00 63 00 
63 00 63 00 63 00 63 00 63 00 63 00 63 00 63 00 
63 00 63 00 63 00 63 00 63 00 63 00 63 00 63 00 
63 00 63 00 63 00 63 00 63 00 63 00 63 00 63 00 
63 00 63 00 63 00 63 00 63 00 63 00 63 00 63 00

Now that I have a table to search for, I want to pick out a representative string from it. For a variety of reasons (mainly to reduce susceptability to small differences in tables), I decided to select a small range from the chrominance table. I picked a section that was at the start of the constant sequence we see "63 00 63 00" etc.

Sequence selected: 42 00 63 00 63 00 63 00

Step 3 - Look for the hardcoded quantization table

  • Open your favorite hex editor
  • Locate your scanner tool software (should be evident from either watching the Window's Task Manager, or an advanced tool such as FileMon). In my case I found the .exe file within:
    C:\Program Files\<Manufacturer>\<Scanner Utility>\<Utility>.exe
  • Search for the representative hex string
  • If your search comes up empty, try these other searches:
    • Search for the 1-byte representation
    • Search for the 4-byte representation
    • Search for a different part of the standard table
  • Assuming that the program is using linear multiples of the JPEG standard tables (herein called the basis tables), there is a high probability that the table will be stored within the program somewhere. So it is very likely that you'll be able to find it with this mechanism. If all of the above fail, there are other techniques, but they are beyond the scope of this article.
  • In my case, I found the sequence quite easily with 2-byte unsigned ints (the representation shown above).
  • Now, we need to work backwards to locate the start of the table(s). By examining the bytes near where the search result found a match, work backwards to find the file offsets of the start of the Luminance table and the start of the Chrominance table.

Step 4 - Finding the Quality Factor

Now that we know that the output is based on the JPEG standard, and (in my case Approximate Quality Factor 70), I have two choices:

  • Approach 1 - Search and Modify for the Quality Factor variable
  • Approach 2 - Modify the Basis Table

While it is possible to modify the quality factor variable (and change it from, say, 70 to 95), the method to locate this variable is complicated and out of the scope of this discussion.

Therefore, I'll choose to modify the basis tables instead.

Step 5 - Reverse Engineering the Table

If the program is using the standard tables (lets call them the basis tables) and then calculating a new quantization table dynamically (i.e. from quality factor 70), then the process becomes a little more complicated.

We know that the program is internally using a formula to convert the basis DQT table to the output DQT tables. A formula that is very commonly used in the industry is the following (popularized by cjpeg and other utilities):

Quality = 1..100
if (Quality < 50) {
  ScaleFactor = 5000 / Quality
} else {
  ScaleFactor = 200 - Quality * 2
}

Loop [i] Through Matrix:
  NewQuantMatrix[i] = (StandardMatrix[i] * ScaleFactor + 50 ) / 100
					

In my case, I see that the Approximate Quality Factor is 70. With the above formula, I determine that:

ScaleFactor = 200 - (70 * 2) = 60
DQT_Output = ( (DQT_Basis * 60) + 50 ) / 100
DQT_Basis = ( (DQT_Output * 100) - 50 ) / 60

If I want the Scanner Utility to produce an image that uses a JPEG compression Quality Factor of 97 (similar to decent current-day digital SLRs), I calculate a DQT Basis table assuming a DQT Output table that represents a Quality Factor of 97.

Taking my Canon 10d as an example of a high quality factor (~97), again using JPEGsnoop, I extract the tables as:

Desired DQT Luminance (decimal)   Desired DQT Chrominance (decimal)
01 01 01 01 01 02 03 03
01 01 01 01 01 03 03 03
01 01 01 01 02 03 03 03
01 01 01 01 03 04 04 03
01 01 02 03 03 05 05 04
01 02 03 03 04 05 06 05
02 03 04 04 05 06 06 05
04 05 05 05 06 05 05 05
approx qual = 97.29
 
01 01 01 02 05 05 05 05
01 01 01 03 05 05 05 05
01 01 03 05 05 05 05 05
02 03 05 05 05 05 05 05
05 05 05 05 05 05 05 05
05 05 05 05 05 05 05 05
05 05 05 05 05 05 05 05
05 05 05 05 05 05 05 05
approx qual = 97.51

Now, I must use the above formula to calculate what the new basis DQT tables should be to get the desired output DQT tables. Passing each value through the formula, and then converting to the 1, 2 or 4-byte representation (as determined earlier), I get the values below. Note that most of the time you can simply round-down any fractional result you get:

Desired DQT Luminance (hex, 2-byte)   Desired DQT Chrominance (hex, 2-byte)
0100 0100 0100 0100 0100 0200 0400 0400
0100 0100 0100 0100 0100 0400 0400 0400
0100 0100 0100 0100 0200 0400 0400 0400
0100 0100 0100 0100 0400 0500 0500 0400
0100 0100 0200 0400 0400 0700 0700 0500
0100 0200 0400 0400 0500 0700 0900 0700
0200 0400 0500 0500 0700 0900 0900 0700
0500 0700 0700 0700 0900 0700 0700 0700
 
0100 0100 0100 0200 0700 0700 0700 0700
0100 0100 0100 0400 0700 0700 0700 0700
0100 0100 0400 0700 0700 0700 0700 0700
0200 0400 0700 0700 0700 0700 0700 0700
0700 0700 0700 0700 0700 0700 0700 0700
0700 0700 0700 0700 0700 0700 0700 0700
0700 0700 0700 0700 0700 0700 0700 0700
0700 0700 0700 0700 0700 0700 0700 0700

There we have it! Now we're ready to try it out.

Instead of calculating this all out for yourself, you are welcome to skip ahead and simply use the above values, as they will provide very high-quality output from your scanner utility. There is no need to match these quantization tables exactly -- a rough approximation to these will likely be far better than the built-in values provided with your utility.

Step 6 - Modify the Executable!

Now we can make some actual modifications and cross our fingers...

MAKE A BACKUP COPY OF YOUR EXECUTABLE FIRST!!!

With your hex editor utility, select the range of bytes that you found earlier that contained the Luminance DQT table. Paste in the series of hexadecimal bytes that you just calculated above. Make sure that you are overwriting the exact same length as the original, and not inserting bytes into the file! (otherwise your executable will fail upon launch).

The snapshots below show the before and after view of the luminance table modifications.

Before ModificationAfter Modification
Hex display before modification Hex display after modification

Now, repeat the same overwrite for the chrominance table (which may follow immediately after the luminance table). Save the executable file.

Step 7 - Test it Out!

  • Open your scanner utility
  • Save a scan as a JPEG
  • Open the JPEG in JPEGsnoop and check the quantization table section

If you did everything correctly, you should see that your scanner is now saving with much higher JPEG image quality than before! Congratulations!

If you succeeded in using this method with your scanner or other program, please share your results below!

 


Reader's Comments:

Please leave your comments or suggestions below!
2012-11-19Bouke
 I did hear/read some statements regarding the relative sensitivity of JPEGs to becoming corrupt over time. A few photographers I spoke to have told me this is the main reason why they store their data in RAW (the photos of my wedding are in .NEF format, for instance). I actually have noticed that some of my very old Jpegs did in fact become corrupt after sitting on the harddisk for multiple years. Supposedly shifting one single bit in a JPEG because of anything really causes the whole file to become corrupt?
Is there much truth to this?

As the11thplague also does, I also archive my scans in PNG or TIFF with LZW compression. Only when I badly need the space, I just convert them into (near)max quality jpegs. Depending on the purpose the native formats of the editor (paint.net's in my case) can also be very beneficial. I do not know enough about all the possible RAW formats however to actually use them to my advantage.
 Interesting question... The short answer is that, yes, it is safer to store your images in RAW (eg. DNG) format than JPEG.

Although JPEGs files themselves are not more likely to get corrupted than any other file, the impact of a corrupted bit can be much greater than in other file types. This is due to the fact that highly-efficient compression mechanisms eliminate the redundancy that one needs to recover from errors. Other file formats (especially uncompressed ones) would certainly be "safer" for image storage on a medium that is susceptible to occasional errors/deterioration. A single bit error can cause significant distortion and/or color artifacts in an image. If the bit error occurs in certain header bytes, then it can make the entire file unreadable by most decoders (though that is often the case with other file formats too).
2009-04-02the11thplague
 And what about the .png format ? it is a lossless format, compressed like a zip. It's far smaller than .bmp, but it is just as good. Also, it has the transparency option wich neither .jpg nor .bmp have. Plus, all new scanners provide the .png option, and our hard drivers are so huge and cheap that there is no problem about space!
2007-07-29mark cox
 This is an interesting article, with your technique, there is a low chance of failure, because false positives are unlikely when searching for a whole table. I would like to try this on the firmware for my all-in-one, or digital camera
 For sure. The tables that are used as the basis for dynamic DQT generation are not always identical to the ones printed in the JPEG standard. So, it may take a little more effort to locate a match in the table (that's why I started with the chrominance table, which is easier to identify). If, instead, the full table is hardcoded (i.e. no dynamic generation), then it becomes far easier.

As for the firmware modification, this is not always easy or even possible. Two big issues I think you may encounter:
  • Many firmware loads use instruction code compression and unpack it on the device.
  • Because errors in firmware loads may cause a device to no longer function, there is a high probability that the load is protected by an MD5 or other checksum, that will prevent it from being installed.
  • On most decent digital cameras, the JPEG compression is likely done in a hardware accelerator ASIC and hence you will not have access to the tables. Software-based encoders are only likely to be found on very low performance digicams and multi-purpose devices (such as your all-in-one).
Let me know if you manage to make any progress.

 


Leave a comment or suggestion for this page:

(Never Shown - Optional)
 

Visits!