Return to Digital Photography Articles

JPEG Huffman Coding Tutorial

After quantization, huffman / entropy coding is one of the more significant contributors to the file size savings in JPEG compression. This page provides a tutorial on how the huffman coding works in a JPEG image. If you have ever wondered how JPEG compression works, this may provide you with some detailed insight.

Why I wrote this tutorial

In attempting to understand the inner workings of JPEG compression, I was unable to find any real details on the net for how Huffman coding is used in the context of JPEG image compression. There are many sites that describe the generic huffman coding scheme, but none that describe how it will appear in a JPEG image, after factoring in the DHT tables, interleaved chroma subsampling components, etc. While it is relatively easy to understand the JPEG marker extraction, the Start of Scan data segment is the least understood and most important part of the image data. Therefore, I decided to create a page to walk through a decompression example. Hopefully others will find this useful!

The relevant sections in the JPEG Standard are quite obscure -- enough so that I set out to analyze several JPEG images to reverse-engineer how the huffman coding was being applied in a JPEG JFIF file.

Latest Update:

[09/22/2009]: Corrected Table 5 (added entry for DC 00 code).
[09/19/2008]: Corrected Table 1 (added entry for codes of length 9 bits).
[12/03/2007]: Corrected typo in text near Table 5 (code 00101). Added JPEGsnoop output (at end of Tutorial).
[01/27/2007]: Added section describing how to expand DHT into bit strings.

The Goal

The goal of this tutorial is to take a simple JPEG image and try to decode the compressed image data by hand, learning how the Huffman compression scheme works in the process.

Simplest JPEG Example

Most digital photos are full-color natural/organic images, which means that all three image components (one luminance and two color channels) will all have both low and high-frequency content. In addition, nearly all digital photos use chroma subsampling, which makes the extraction process a little more complicated. For the purposes of showing the basic huffman extraction, we will start with the simplest of all JPEG images:

  • Grayscale - no content in the two color channels
  • Solid color in each MCU - By making all pixels in an 8x8 block the same color, there will be no AC components.
  • No chroma subsampling - Makes scan data extraction simpler: Y, Cb, Cr, Y, Cb, Cr, etc.
  • Small Image - Total image size is 16x8 = two MCUs or blocks. This makes the extraction in this tutorial shorter.

Creating the Image

For the purposes of this tutorial, my working image will simply be a 16x8 pixel image, with two solid color blocks: one black and the other white. Note that each block is 8x8 pixels in size. The actual image is here: . If you want to download it, right-click and select Save Picture As...

Creating the sample image was trivial, working at 1600% view. Important that dimensions and any changes in the content are on 8-pixel boundaries. Overall image dimensions should be a multiple of 8 pixels as well, in both directions. The image below is a magnified version with a grid overlayed.

Once the image was created, it was saved with Photoshop CS2's Save for Web... command. This kept the file size down as it discards other extraneous file information (metadata, etc.) that is not relevant to this tutorial. Some other important points:

  • Use Save for Web - Reduces total file content to minimal subset.
  • Use Quality level 51+ - This ensures that there is no chroma subsampling enabled in the JPEG encoding process, according to the way that Photoshop Save for Web operates. I used quality 80 for this example.
  • Turn Optimized Off - For the purposes of this example, I think it is important to work with realistic huffman tables, not degenerate single-entry tables. Therefore I recommend that JPEG Huffman Table Optimization is left off.
  • Other settings: Blur off, Progressive off, ICC profile off.

Grayscale Photoshop Images

It should be noted that when you save a JPEG image from within Photoshop it always contains three components (Y, Cb, Cr). If you change the mode to grayscale (via Mode->Grayscale), the three components are still saved, even though the JPEG standard supports an image with only one component (which would be assumed to be grayscale).

What is Huffman Coding / Entropy Coding?

Huffman coding is a method that takes symbols (e.g. bytes, DCT coefficients, etc.) and encodes them with variable length codes that are assigned according to statistical probabilities. A frequently-used symbol will be encoded with a code that takes up only a couple bits, while symbols that are rarely used are represented by symbols that take more bits to encode.

A JPEG file contains up to 4 huffman tables that define the mapping between these variable-length codes (which take between 1 and 16 bits) and the code values (which is an 8-bit byte). Creating these tables generally involves counting how frequently each symbol (DCT code word) appears in an image, and allocating the bit strings accordingly. But, most JPEG encoders simply use the huffman tables presented in the JPEG standard. Some encoders allow one to optimize these tables, which means that an optimal binary tree is created which allows a more efficient huffman table to be generated.

For a reasonable explanation of how it works, please see this example of Huffman coding an ASCII string and the overview from Wikipedia.

For more details, please see my article on Optimized JPEGs - optimizing the huffman tables, particularly the first introductory sections and the section near the end titled "Standard Huffman Tables".

Decoding the JPEG Scan Data

Using JPEGsnoop

For those who are trying to understand the complex huffman decoding in a JPEG image, I'm happy to report that JPEGsnoop can now report all of the variable length code decoding for each MCU (use the Detailed Decode option). For the sample output, scroll to the bottom of this tutorial.

Decoding by Hand

The following is the decode method done by hand, which is obviously impractical for most images, but is shown here in detail to help one learn the process involved.


The above hex dump datastream shows the beginning of the Start of Scan (SOS marker 0xFFDA) marked in yellow, followed by some additional details in green and then the actual scan data selected in dark blue. Finally, the image is terminated with an End of Image (EOI marker 0xFFD9). So, the huffman-coded data content is only 9 bytes long.

Comparison of Compression File Sizes

For the sake of comparison, the original image (16 pixels by 8 pixels) contains a total of 128 pixels (2 MCUs). With 8 bits per channel (RGB), this corresponds to an uncompressed image size of 384 bytes (128 pixels x 8 bits/channel x 3 channels x 1 byte/8 bits). Clearly, using a run-length encoded format such as GIF would have produced even more image compression in examples like this (although GIF actually takes 22 bytes to code the stream because there are 16 separate runs). JPEG is not really designed to be optimized for this type of synthetic (non-organic) image.

If one uses optimized JPEG encoding, it is possible to reduce the image content size even further. In the example image, the optimized version has much smaller huffman tables (DHT) and shorter bitstrings to represent the same codewords. The net effect is that the image content size is reduced even further (to 7 bytes).

File FormatTotal Size Overhead Size Image Content Size
BMP (Uncompressed) 440 Bytes 56 Bytes 384 Bytes
JPEG653 Bytes 644 Bytes 9 Bytes
JPEG (Optimized) 304 Bytes 297 Bytes 7 Bytes
GIF60 Bytes 38 Bytes 22 Bytes

Scan Data Decode

The scan data is:

FC FF 00 E2 AF EF F3 15 7F

To help resiliency in the case of data corruption, the JPEG standard allows JPEG markers to appear in the huffman-coded scan data segment. Therefore, a JPEG decoder must watch out for any marker (as indicated by the 0xFF byte, followed by a non-zero byte). If the huffman coding scheme needed to write a 0xFF byte, then it writes a 0xFF followed by a 0x00 -- a process known as adding a stuff byte.

For our extraction purposes, we will replaceme any padding bytes (0xFF00 with 0xFF):

FC FF E2 AF EF F3 15 7F

The expectation is that image content is 3 components (Y, Cb, Cr). Within each component, the sequence is always one DC value followed by 63 AC values.

For each MCU, with no chroma subsampling, we would expect the following data to be encoded:


Note that some people get the order of the chrominance channels mixed up, and assume that it is YCrCb instead.

The figure below shows what the DCT matrix from a single MCU (8x8 pixel square) in a digital photo typically looks like. These are the entries after quantization, which has caused many of the higher-frequency components (towards the bottom-right corner of the matrix) to become zero. By the distribution of values in the frequency-domain matrix representation, it is possible to determine that the 8x8 pixel square had very little high-frequency content (i.e. it had only a gradual intensity / color change).

The DC component represents the average value of all pixels in the 8x8 MCU. Since we have deliberately created an image where all pixels in the 8x8 block are the same, we expect this value to represent either the black or white "color". The code provided in the DC entry (#0) indicates a huffman-encoded size (e.g. 1-10 bits) which is the number of bits needed to represent the average value for the MCU (eg. -511...+511).

Note that the DC component is encoded as a relative value with respect to the DC component of the previous block. The first block in the JPEG image is assumed to have a previous block value of zero.

Following the single DC component entry, one or more entries are used to describe the remaining 63 entries in the MCU. These entries (1..63) represent the low and high-frequency AC coefficients after DCT transformation and quantization. The earlier entries represent low-frequency content, while the later entries represent high-frequency image content. Since the JPEG compression algorithm uses quantization to reduce many of these high-frequency values to zero, one typically has a number of non-zero entries in the earlier coefficients and a long run of zero coefficients to the end of the matrix.

For the purposes of this tutorial, I have deliberately created an image that has constant color across all 8x8 pixels in each of the two MCU. Because there are no changes in value across each 8x8 pixel region, there is no AC (or higher frequency content) within the block. As a result, all 63 entries in the AC portion are expected to be zero (unlike the figure above). This allows us to focus on the DC component, which we do expect to change from MCU to MCU block.

The hex string shown earlier (after removal of padding bytes) can be represented in binary as the following:

1111 1100 1111 1111 1110 0010 1010 1111 1110 1111 1111 0011 0001 0101 0111 1111

Extract Huffman Code Tables

Using a utility such as JPEGsnoop, you can extract the Huffman tables from the JPEG image file. Often, you will find four huffman table entries (tagged with a DHT marker):

  • DHT Class=0 ID=0 - Used for DC component of Luminance (Y)
  • DHT Class=1 ID=0 - Used for AC component of Luminance (Y)
  • DHT Class=0 ID=1 - Used for DC component of Chrominance (Cb & Cr)
  • DHT Class=1 ID=1 - Used for AC component of Chrominance (Cb & Cr)

The huffman compression tables are encoded in a somewhat confusing manner. Although you can draw out the binary tree by hand, it will be easier if you rely on a tool such as JPEGsnoop to generate all of the binary comparison strings for each huffman code in all four DHT sections.

The following four tables were extracted from the JPEG file that was created by Photoshop for the purposes of this tutorial. Other JPEG images may be reliant on different DHT tables, so it is important to extract them prior to analyzing the file. Note that turning on JPEG Optimization will create vastly different Huffman tables, with far fewer entries. For a point of comparison, the image described in this tutorial would only need optimized huffman tables of one entry each to represent our image content.

NOTE: It is important to realize that in each case the DHT entries in the JPEG file only list the Length and Code values, not the actual Bit String mapping. It is up to you to rebuild the binary tree representation of the DHT table to derive the bit strings! Please see the DHT Expansion section near the end of this tutorial for more details.

Table 1 - Huffman - Luminance (Y) - DC

3 bits 000
00 (End of Block)
4 bits 1110 07
5 bits 1111 0 08
6 bits 1111 10 09
7 bits 1111 110 0A
8 bits 1111 1110 0B

Table 2 - Huffman - Luminance (Y) - AC

2 bits 00
3 bits 100 03
4 bits 1010
00 (End of Block)
5 bits 1101 0
1101 1
1110 0
6 bits 1110 10
1110 11
... ... ...
12 bits ...
1111 1111 0011
F0 (ZRL)
... ... ...
16 bits ...
1111 1111 1111 1110

Table 3 - Huffman - Chrominance (Cb & Cr) - DC

2 bits 00
00 (End of Block)
3 bits 100
4 bits 1100
5 bits 1111 0 07
6 bits 1111 10 08
7 bits 1111 110 09
8 bits 1111 1110 0A
9 bits 1111 1111 0 0B

Table 4 - Huffman - Chrominance (Cb & Cr) - AC

2 bits 00
00 (End of Block)
3 bits 100
4 bits 1100 03
5 bits

1101 0
1101 1

6 bits 1110 00
1110 01
1110 10
... ... ...
9 bits ...
1111 1100 0
F0 (ZRL)
... ... ...
16 bits ...
1111 1111 1111 1110


Table 5 - Huffman DC Value Encoding

The following table shows how the bit fields that follow a DC entry can be converted into their signed decimal equivalent. To use this table, start with the DC code value and then extract "Size" number of bits after the code. These "Additional Bits" will represent a signed "DC Value" which becomes the DC value for that block. Note that this table applies to any JPEG file -- this table is not written anywhere in the JPEG file itself.

For example, let's assume that one was about to decompress a chrominance DC entry. If the previously-decoded "DC Code" was 05, then we must extract 5 bits following the code bits. If the next 5 bits were 00101, then this can be interpreted as decimal -26. The bits 10001 would be +17 and 11110 would be +30.

DC Code Size Additional Bits DC Value
00 0   0
01 1 01 -11
02 2 00,0110,11 -3,-22,3
03 3 000,001,010,011100,101,110,111 -7,-6,-5,-44,5,6,7
04 4 0000,...,01111000,...,1111 -15,...,-88,...,15
05 5 0 0000,... ...,1 1111 -31,...,-16 16,...,31
06 6 00 0000,... ...,11 1111 -63,...,-32 32,...,63
07 7 000 0000,... ...,111 1111 -127,...,-64 64,...,127
08 8 0000 0000,... ...,1111 1111 -255,...,-128 128,...,255
09 9 0 0000 0000,... ...,1 1111 1111 -511,...,-256 256,...,511
0A 10 00 0000 0000,... ...,11 1111 1111 -1023,...,-512 512,...,1023
0B 11 000 0000 0000,... ...,111 1111 1111 -2047,...,-1024 1024,...,2047

Block 1 - Luminance

Luminance (Y) - DC

Referring to the Y(DC) table (Table 1), we start with the first few bits of the coded stream (1111 1100 1111...) and recognize that code x0A matches the bit string 1111 110.

1111 1100 1111 1111 1110 0010 1010 1111 1110 1111 1111 0011 0001 0101 0111 1111
=> Code: 0A

This code implies that hex A (10) additional bits follow to represent the signed value of the DC component. The next ten bits after this code is 0 1111 1111 1. Table 5 above shows the DC values represented by these "additional bits" -- in this case, the bit string corresponds to a value of -512.

1111 1100 1111 1111 1110 0010 1010 1111 1110 1111 1111 0011 0001 0101 0111 1111
=> Value: -512

Our progress so far:

Bits1111 1100 1111 1111 1 110 0010 1010 1111 1110 1111 1111 0011 0001 0101 0111 1111
ComponentY ???
AC/DC DC ???
Value -512 ???

Luminance (Y) - AC

After the DC component, we begin the 63-entry AC matrix for the Y Luminance. This uses a different Huffman table (Table 2).

1111 1100 1111 1111 1110 0010 1010 1111 1110 1111 1111 0011 0001 0101 0111 1111
=> Code: 00 (EOB)

In the above huffman code table, the code 1100 corresponds to an EOB (End of Block). Therefore, the AC components was cut short early (no other codes). This means that all 63 entries of the matrix (all entries except the 1st entry, which is the DC component) are zeros. Since we have finished the luminance component, we then move on to the chrominance components (Cb and Cr).

Bits1111 1100 1111 1111 1 1100010 1010 1111 1110 1111 1111 0011 0001 0101 0111 1111
ComponentY ???
Value -512 0???

Block 1 - Chrominance

Chrominance (Cb) - DC

1111 1100 1111 1111 1110 0010 1010 1111 1110 1111 1111 0011 0001 0101 0111 1111
=> Code: 00 (EOB)

End of chrominance DC, start on AC.

Chrominance (Cb) - AC

1111 1100 1111 1111 1110 0010 1010 1111 1110 1111 1111 0011 0001 0101 0111 1111
=> Code: 00 (EOB)

Again, the AC is terminated right away. Now, we move on to the second chrominance channel, Cr.

Bits1111 1100 1111 1111 1 11000101010 1111 1110 1111 1111 0011 0001 0101 0111 1111
Component Y Cb ???
Value -512 000???

Chrominance (Cr) - DC

Refer to Table 3 for the relevant Huffman codes.

1111 1100 1111 1111 1110 0010 1010 1111 1110 1111 1111 0011 0001 0101 0111 1111
=> Code: 00 (EOB)

This marks the end of the DC.

Chrominance (Cr) - AC

Refer to Table 4 for the relevant Huffman codes.

1111 1100 1111 1111 1110 0010 1010 1111 1110 1111 1111 0011 0001 0101 0111 1111
=> Code: 00 (EOB)

This marks the end of the AC.

Bits1111 1100 1111 1111 1 110001 0101 01111 1110 1111 1111 0011 0001 0101 0111 1111
ComponentY Cb Cr ???
Value -51200000???

Block 2 - Luminance

Luminance (Y) - DC

1111 1100 1111 1111 1110 0010 1010 1111 1110 1111 1111 0011 0001 0101 0111 1111
=> CODE: 0A

This code indicates that the value is stored in the next ten bits (A in hex is 10 in decimal):

1111 1100 1111 1111 1110 0010 1010 1111 1110 1111 1111 0011 0001 0101 0111 1111
=> Value: +1020

Bits1111 1100 1111 1111 1 110001 0101 01111 1110 1111 1111 0011 0001 0101 0111 1111
ComponentY Cb Cr Y???
Value -512 00000 +1020 ???

Luminance (Y) - AC

1111 1100 1111 1111 1110 0010 1010 1111 1110 1111 1111 0011 0001 0101 0111 1111

The end-of-block indicator means that all remaining values are zero. Since we haven't even started with the first value, all 63 values can be interpreted as zero. This means that there is no non-DC image content, which is to be expected since all 64 pixels (8x8) in the block are the same color.

Bits1111 1100 1111 1111 1 110001 0101 01111 1110 1111 1111 00 1100 01 0101 0111 1111
ComponentY Cb Cr Y???
Value -512 00000 +1020 0???

Block 2 - Chrominance

Chrominance (Cb) - DC

1111 1100 1111 1111 1110 0010 1010 1111 1110 1111 1111 0011 0001 0101 0111 1111
=> CODE: 00 (EOB)

Chrominance (Cb) - AC

1111 1100 1111 1111 1110 0010 1010 1111 1110 1111 1111 0011 0001 0101 0111 1111
=> CODE: 00 (EOB)

Chrominance (Cr) - DC

1111 1100 1111 1111 1110 0010 1010 1111 1110 1111 1111 0011 0001 0101 0111 1111
=> CODE: 00 (EOB)

Chrominance (Cr) - AC

1111 1100 1111 1111 1110 0010 1010 1111 1110 1111 1111 0011 0001 0101 0111 1111
=> CODE: 00 (EOB)

Bits1111 1100 1111 1111 1 110001 0101 01111 1110 1111 1111 00 1100 0101010111 1111
ComponentY Cb Cr Y Cb Cr X
Value -512 00000 +1020 00000X


Because the scan data must end on a byte boundary, there may be a remainder of bits that will simply be tossed. In this case we see 6 bits (all-1's) that will be discarded.

1111 1100 1111 1111 1110 0010 1010 1111 1110 1111 1111 0011 0001 0101 0111 1111

Conversion to Spatial Domain

Now that we have the DCT (Discrete Cosine Transform) values, we can try to determine what the content of the image was in the spatial domain. Remember that the DC & AC values that we decompressed from the huffman coding are the frequency-domain representation of the image, not the real YCbCr values.

Given that all of the values in the Cb and Cr channels (chrominance) are zero, we can assume that the image is grayscale and instaed focus on the luminance values (brightness).

Second, all of the AC component values are zero, which means that there is no higher frequency content in the images -- in fact, we can determine that all of the image data in each 8x8 block is of the same color & intensity (i.e. only the DC component remains).

We can determine the following:

BlockEncoded (Relative) DC Value
1Y = -512
2 Y = +1020

Relative DC to Absolute DC

Note that the DC components are encoded as a difference from the previous block (with the first block assumed to start relative to zero). Therefore, we know that block 1 has a DC value of -512, but that block 2 has a DC value of +1020 relative to block 1. So, we now know:

BlockActual (Absolute) DC Value
1Y = -512
2 Y = +508


Finally, we want to convert the DCT DC value to an RGB value. Assuming that the gain of the DCT transform is 4, we divide the values by 4 to get block1 = -128, block2 = +127.

Now, we have to convert between YCbCr and RGB. Please refer to the formulae provided on the JPEG color conversion page. There, you will see that we require a level shift of +128 to get input values in the range (0..255).

Color conversion yields RGB values of:

BlockRGB Value (hex)RGB Value (decimal)
2 (0xFF,0xFF,0xFF) (255,255,255)

... and these are the original values that were used to create the JPEG image!

Expansion of DHT Table into binary bit strings

Note that the DHT tables that appear in the JPEG file only define the number of codes for each bitstring length, followed by a sequence of all the code words. The bit strings that are used to represent each of these code words is not included! It is the job of the JPEG decoder to derive these bit strings from the little information that is provided.

There are many methods in which the generation of the binary strings is performed in a decoder, and most of them are heavily optimized for performance (e.g. the djpeg huffman routines). These implementations are quite difficult to learn from. I find it much more instructive to generate these bit sequences by drawing out the huffman binary tree by hand.

In an extreme simplification, binary trees are composed of a root node which has a left and right branch to other nodes. Each of these nodes can either be a leaf node (the end of a branch) or split further into two more nodes.

The actual DHT in the JPEG file:

The following was extracted directly from the DHT (Define Huffman Table, shown above in Table 2: Luminance AC) in the JPEG file using JPEGsnoop.

  Class = 1 (AC Table)
  Destination ID = 0
    Codes of length 01 bits (000 total): 
    Codes of length 02 bits (002 total): 01 02 
    Codes of length 03 bits (001 total): 03 
    Codes of length 04 bits (003 total): 11 04 00 
    Codes of length 05 bits (003 total): 05 21 12 
    Codes of length 06 bits (002 total): 31 41 
    Codes of length 07 bits (004 total): 51 06 13 61 
    Codes of length 08 bits (002 total): 22 71 
    Codes of length 09 bits (006 total): 81 14 32 91 A1 07 
    Codes of length 10 bits (007 total): 15 B1 42 23 C1 52 D1 
    Codes of length 11 bits (003 total): E1 33 16 
    Codes of length 12 bits (004 total): 62 F0 24 72 
    Codes of length 13 bits (002 total): 82 F1 
    Codes of length 14 bits (006 total): 25 43 34 53 92 A2 
    Codes of length 15 bits (002 total): B2 63 
    Codes of length 16 bits (115 total): 73 C2 35 44 27 93 A3 B3 36 17 54 64 74 C3 D2 E2 
                                         08 26 83 09 0A 18 19 84 94 45 46 A4 B4 56 D3 55 
                                         28 1A F2 E3 F3 C4 D4 E4 F4 65 75 85 95 A5 B5 C5 
                                         D5 E5 F5 66 76 86 96 A6 B6 C6 D6 E6 F6 37 47 57 
                                         67 77 87 97 A7 B7 C7 D7 E7 F7 38 48 58 68 78 88 
                                         98 A8 B8 C8 D8 E8 F8 29 39 49 59 69 79 89 99 A9 
                                         B9 C9 D9 E9 F9 2A 3A 4A 5A 6A 7A 8A 9A AA BA CA 
                                         DA EA FA 
    Total number of codes: 162

So, how do we create the binary bit-strings for each of these codes?

Well, we start to build a binary tree, creating new branches and putting the code words into leaf nodes of the tree. Row 1 of the tree contains code words that only require 1 bit to encode, row 2 contains code words (leaf nodes) that only require 2 bits to encode and so on.

We first start with row 0:

  • Row 0 (the root node) is almost always a parent node, creating a left and a right branch down to the next row. We label the left branch 0 and the right branch 1.
  • At row 1, we try to fill in any of the nodes (there are 2 at this level) with code words that take 1 bit to encode. We see from the DHT that there are no codes of length 1 bit. So, for each of the two nodes we spawn off a left and right node below (on row 2), creating a total of 4 nodes. Again, for these branches, we label the left one 0 and the right one 1.
  • At row 2, we will try to fill in any codes of length 2 bits, starting from left to right. This time we see in the DHT that there are two codes that can be encoded with bit strings of length 2. So, we take the first code value x01 (hex) and place this in the first node, making it a leaf node (no further branches will come from it). We then take the second code value x02 and place this in the second node, making it too a leaf node. At this point we have two more nodes left in this row of the tree but no more codewords listed in the DHT for this bitstring length. Therefore, we create two branches for each of the remaining nodes of this row. Since two nodes are left, this will create a total of 4 branches, meaning that there will be again 4 nodes in the third row of the binary tree.
  • At row 3, we have four nodes available, but the DHT indicates only one codeword uses 3 bits to encode. So, we place x03 at the leftmost of these nodes, and we create branches for each of the remaining three nodes (making 6 nodes for row 4).
  • At row 4, we have six nodes available, but the DHT indicates only three codewords use 4 bits to encode. This time we terminate three nodes (making them leaf nodes) and we further extend the other three down to row 5.

This process continues until we have used all of the codewords defined in the DHT table. The diagram below shows the expansion of the first four rows of the above DHT.

Binary Tree Expansion of DHT

NOTE: I understand from VG that the above representation may be "right-oriented" rather than "left-oriented"

Once the binary tree has been completed, you can read off the bit strings for each code word by combining the bit labels of each branch on the path down from the root node. For example, code word 0x04 is encoded by the binary bit string 1011 because we need to take branch 1 from the root node, 0 from the node on row 1, 1 from the node on row 2 and branch 1 from the node on row 3.

This process can be quite laborious, and the binary tree often takes on a shape that would be difficult to draw through to completion. To make this easier, I have added a feature, DHT Expand, to JPEGsnoop that determines the binary bit strings for each of the code words appearing in the DHT table.

Below is the expansion of the first few rows of the table, using JPEGsnoop with the DHT Expand mode enabled.

  Expanded Form of Codes:
    Codes of length 02 bits:
      00 = 01
      01 = 02
    Codes of length 03 bits:
      100 = 03
    Codes of length 04 bits:
      1010 = 11
      1011 = 04
      1100 = 00 (EOB)
    Codes of length 05 bits:
      11010 = 05
      11011 = 21
      11100 = 12
    Codes of length 06 bits:
      111010 = 31
      111011 = 41
    Codes of length 07 bits:
      1111000 = 51
    Codes of length 15 bits:
      111111111000100 = B2
      111111111000101 = 63
    Codes of length 16 bits:
      1111111110001100 = 73
      1111111110001101 = C2
      1111111111111100 = DA
      1111111111111101 = EA
      1111111111111110 = FA

Implementations in real JPEG decoders optimize this mechanism heavily as the bitstring search / parsing process is not a trivial task in processors designed to work with 32-bit words or bytes. Most of them end up creating large lookup tables that define lower and upper bounds for a match of a particular bitstring / code.

More Realistic JPEG Images

Obviously the above is an extremely simple example JPEG image. However, real images will have some other characteristics that you will encounter:

  • Chroma subsampling - You can expect that most photos will have 2x1 chroma subsampling, which will mean that the decoding sequence per MCU will be Y1 Y2 Cb Cr instead of Y Cb Cr.
  • AC components - You most certainly will have non-zero AC coefficients, unlike the above. In such a case, you will generally have a few non-zero values, which will eventually be terminated with a ZRL (code word 0xF0) which indicates a run of 16 zeros and then probably a EOB (code word 0x00).

Some more detail regarding Huffman coding with chroma subsampling and other factors are described in the JPEG decoder page.

JPEGsnoop Detailed Decode Output

Running JPEGsnoop on the image shown above, with the Scan Segment->Detailed Decode option enabled, the following output is shown:

*** Decoding SCAN Data ***
  OFFSET: 0x00000282
  Scan Decode Mode: Full IDCT (AC + DC)

    Lum (Tbl #0), MCU=[0,0]
      [0x00000282.0]: ZRL=[ 0] Val=[ -512] Coef=[00= DC] Data=[0x FC FF 00 E2 = 0b (11111100 11111111 1------- --------)] 
      [0x00000285.1]: ZRL=[ 0] Val=[    0] Coef=[01..01] Data=[0x E2 AF EF F3 = 0b (-1100--- -------- -------- --------)] EOB
                      DCT Matrix=[-1024     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]

    Chr(0) (Tbl #1), MCU=[0,0]
      [0x00000285.5]: ZRL=[ 0] Val=[    0] Coef=[00= DC] Data=[0x E2 AF EF F3 = 0b (-----01- -------- -------- --------)] EOB
      [0x00000285.7]: ZRL=[ 0] Val=[    0] Coef=[01..01] Data=[0x E2 AF EF F3 = 0b (-------0 1------- -------- --------)] EOB
                      DCT Matrix=[    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]

    Chr(0) (Tbl #1), MCU=[0,0]
      [0x00000286.1]: ZRL=[ 0] Val=[    0] Coef=[00= DC] Data=[0x AF EF F3 15 = 0b (-01----- -------- -------- --------)] EOB
      [0x00000286.3]: ZRL=[ 0] Val=[    0] Coef=[01..01] Data=[0x AF EF F3 15 = 0b (---01--- -------- -------- --------)] EOB
                      DCT Matrix=[    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]

    Lum (Tbl #0), MCU=[1,0]
      [0x00000286.5]: ZRL=[ 0] Val=[ 1020] Coef=[00= DC] Data=[0x AF EF F3 15 = 0b (-----111 11101111 111100-- --------)] 
      [0x00000288.6]: ZRL=[ 0] Val=[    0] Coef=[01..01] Data=[0x F3 15 7F FF = 0b (------11 00------ -------- --------)] EOB
                      DCT Matrix=[ 2040     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]

    Chr(0) (Tbl #1), MCU=[1,0]
      [0x00000289.2]: ZRL=[ 0] Val=[    0] Coef=[00= DC] Data=[0x 15 7F FF D9 = 0b (--01---- -------- -------- --------)] EOB
      [0x00000289.4]: ZRL=[ 0] Val=[    0] Coef=[01..01] Data=[0x 15 7F FF D9 = 0b (----01-- -------- -------- --------)] EOB
                      DCT Matrix=[    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]

    Chr(0) (Tbl #1), MCU=[1,0]
      [0x00000289.6]: ZRL=[ 0] Val=[    0] Coef=[00= DC] Data=[0x 15 7F FF D9 = 0b (------01 -------- -------- --------)] EOB
      [0x0000028A.0]: ZRL=[ 0] Val=[    0] Coef=[01..01] Data=[0x 7F FF D9 00 = 0b (01------ -------- -------- --------)] EOB
                      DCT Matrix=[    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]


Reader's Comments:

Please leave your comments or suggestions below!
 Ha, turned out my way of dealing with stuffed bytes had a bug; if I had FF 00 FF 00 XX and a Huffman code started in the first FF and ended on the first bit of the XX then I'd loose track of which bit I last read. Thanks for the link, it motivated me to reexamine stuff for the umpteenth time.

 Excellent! Glad you managed to figure that out!
 I don't know if you're familiar with the TIFF format but DNGs are basically an extension of the TIFF format to account for camera parameters. That said, my DNG images are broken up into several hundred smaller tiles that are embedded in the DNG. Each tile is its own self-contained Jpeg and while the standard's pretty open, mine are compressed using the lossless encoding from the original Jpeg standard (10918-1 / T.81) so it's very simple: made a prediction based on previously decoded samples and a few supplied equations, calculated the deviation from that tile's Huffman tables, add them to get the sample value, repeat. All the necessary markers are there so I don't think it's a tile padding issue and 95% of my tiles decode properly and look as I'd expect so I seem to be correctly handling the padding associated with 0xFF bytes in the compressed bit stream. I've also tried skipping 1-17 bits after encountering errors and different ways of handling the FF00FF00 codes but while some tries look less bad none fix it. I suppose I'll soon be digging into the Adobe DNG SDK, which I've been hoping to avoid. Again, thanks for the help.
 Thanks for the clarification on how the DNGs work. The only thing that comes to mind is the JPEG decoder deviation referenced here. Other than that, I'm not sure what you might be observing with that encoding scenario. Good luck and let me know if you happen to uncover the cause!
I'm implementig a jpeg-decoder and I'm now unsecure, because when I take the small example picture above you used for this tutorial in JPEGsnoop there are 2 QT and 4 Huffman-tables for that picture, however in the hex form of this file there is just one DQT and DHT segment. How it could be explained? Must not one DQT or DHT segment contain just one table?

 Hi Usam -- when you say there is only one DQT and DHT segment in the hex for form of this file, can you clarify which place you might be looking? (ie. the tutorial text or what you see in JPEGsnoop under the Marker: DQT or Marker: DHT)? Thanks!
 Just curious, do you know what to make of bit sequences that don't match any codes? I'm working with a DNG file where the individual tiles use a lossless jpeg encoding. Most decode properly but a few contain the hex sequence 0xFF 0x00 0xFF 0x00 so after removing the stuffed bytes I'm left with long strings of 1 bits that don't match any codes. Also, the files open and display properly in the Adobe software that converted them from RAW. Thanks and your site's already been a ton of help!
 Hi Chad -- I haven't looked specifically at how DNG is encoded, but I am wondering if perhaps what you are observing is some additional fill bits/bytes at the end of each tile? This would be similar to how the JPEG RSTn (restart) markers need to pad up the bits to align to a byte boundary. I would be interested to know if you can correlate this invalid huffman code stream to a tile boundary. If you experiment with the number of bits to skip after you start the invalid sequence, are you able to regain sync again?
 Hi Clavin,

Thank you for explanation! I was wondering if you could please also explain the Huffman Code Histogram stats?

Thank you
 The Huffman code histogram stats identifies how frequently each variable length [Huffman] code appears within the encoded image. If the encoder produced an optimum a huffman table, then the majority of the image would be created with shorter codes, which would result in larger percentage values for the code lengths near the top of the histogram list.
 Please if anyone can help me with the code of huffman,i m getting binary values after aplying the algorithm ,don't know how to convert it into an compressed image???plzz help
 Calvin, thank you much for your tutorial and examples! I need to implement (or have implemented, let me know if you are interested in being paid to do it) a fairly efficient decoder for extracting just the greyscale DC value for each 8x8 block. I am wondering if there is any reasonable way to jump to the start of each block (or end of previous block) - or if all of the codes must be parsed. On quick glance, it looks to me like the EOB marker can only be found when it occurs in the context of next symbol to extract - not from a simple scan. The application is to parse data in a microcontroller, with the data originating in an imager chip that is streaming jpeg compressed image. I only want the 1/8 x 1/8 greyscale image. Thx!
 Hi Michael --

Glad to hear the tutorial was useful! Extracting the luminance DC value for each block is certainly faster than decoding the full image, but unfortunately (as you've discovered) it is not particularly easy for a decoder to implement random-access to the encoded blocks in the general case. The EOB bit strings are almost always found at the end of an encoded block (if there are less than 63 AC elements after quantization), but unfortunately the bitstring for EOB (eg. 4'b1010) can alias with a great number of other huffman codes or their associated signed value. Therefore, jumping ahead can actually be quite a challenge. If the encoder can be configured to embed restart markers liberally throughout the image then the decoder can take advantage of these to perform some limited random access by searching for the RSTn codes.
 please can u give an example to decode if all AC components not zeros
 Hi -- I don't have an example file that would demonstrate a "full" set of AC components, but basically we would expect to see the DC component followed by 63 AC components (without the EOB code at the end).
 Hi Calvin,

thanks for your research regarding my MCU database! Here are some thoughts on your answers:

- large key sizes: one could use SHA-1 checksums (20 bytes) for encoding the compressed MCU (approx. 31 bytes), which would save 1/3 of the filesize for a quality 90 JPEG file (however we should not forget the extremely large size of our hypothetical database and the fact that i can gain similar savings by reducing the jpeg quality with almost no effect on the visual quality of the image)

- another idea would be not compress/index entire MCUs, but individual chroma/luma tables within the MCU (I suppose your calculation of up to 64 entries of about 10bits each refers to a table, not an entire MCU). However I think it is not possible to account for all table permutations, rather you have to use a prediction model like the one in AVC/H.264 ( to predict MCU tables based on the surrounding tables. It would be interesting to know if "backporting" such a feature to JPEG makes sense (e.g. in the form of a decoding preprocessor that restores the original MCU values before the decoding happens). At least it would be a lossless operation (re-encoding to another image format isn't)

- btw, can you tell me why JPEGSnoop 1.7.3 reports 3 luminance tables in 1 MCU? (lines 263- 336)
 Hi Franz --

Thanks for sharing your interesting ideas!

Key size:
  • Using cryptographic hashes (such as SHA-1) could potentially cut down on the lookup key / table size, but you would then also run the risk (very small, though) of having data corruption due to hash collisions. It is very unlikely that two real-world MCUs would produce the same hash, but it could be possible. If your transencoder mapped two different MCUs to the same hash key then you would corrupt during transdecode.
  • As you mention, an increase in compression (eg. through increased quantization factors) could allow for significant reduction in storage size without perceptible image degradation. Some techniques have been used in optimizing this recompression (based on image content) to find the optimum balance between compression and error (according to the human visual system limitations).

Compressing chroma/luma tables:
  • I'm not quite sure I understood the suggestion fully. Did you mean compress / create a lookup for the DQT / DHT tables? While there are a very wide range of possible DQT tables, the storage is very minimal (eg. 64 bytes for a DQT) and only appears once in the file. Perhaps I misunderstood the idea? Since the encoded MCUs can indeed be long and repeated throughout the image, it is conceivable that repeats could be identified and compressed further. However, it is unlikely in a natural "real-world" image to find such repeated MCUs.
  • Thanks for providing the link to the AVC/H.264 encoding methodology -- it was a great read. Your idea of leveraging neighboring pixel elements in a JPEG post-encoder stage (or pre-decoder) is interesting. To a degree, I think something similar has been done with the arithmetic coding mode of JPEG. I have seen some software advertised that effectively transcodes an existing baseline JPEG into an arithmetic coded JPEG to gain further file size reduction. You might want to take a deeper look into that.

3 Luminance tables:
  • I assume you are referring to the 4 Lum (Tbl #0), in the detailed Decode Scan? If so, the reason we see 4 luminance entries (MCUs) before the chrominance ones is that this particular image uses chroma subsampling to further the compression of the chrominance channels versus the luminance channels. In this case the image uses "2x2" subsampling which means that it has halved both the horizontal and vertical resolution in the chrominance, but kept the full resolution in the luminance channel. When encoding such an image, the actual MCU size is now 16 x 16 pixels (instead of 8x8). For that 256 pixel block, there are effectively 4 luminance MCUs and 1 chrominance MCU. The encoder records the luminance MCUs first before continuing on with the chrominance.
 Hi Calvin,

recently I have been contemplating about an MCU "database". My idea is to generate all possible MCUs for a certain JPEG encoder setting (e.g. libjpeg, quality 80) and store them into a database. Then i would remove all MCU data from a JPEG file (which was encoded using the same settings) and replace it with links to my database.

What I hope to achieve is a significant reduction in JPEG filesize without the drawbacks of encoding the file to another format (e.g. quality loss, incompatible tools). In fact, the original JPEG image could be restored by replacing the MCU database links with the actual MCU data.

I am not sure if my idea is feasible, because it depends on a few assumptions (e.g. the average MCU size in bytes is not bigger than the database key that is needed to reference them => MCUs are "big" enough, not too many different MCUs). So I really appreciate your answers to the following questions:

- what is the average size of a MCU in bytes?
- probability of duplicate MCUs in one image or across similar images (which were encoded using the same image encoder and JPEG quality settings)?
- how many different MCUs can be produced/encoded with a certain JPEG encoder setting (say libjpeg, quality 80)? (if i approach the problem using pixel permutations, e.g. how many different 8x8 pixel/MCU tiles can be created in a 4-bit-grayscale image, i quickly end up with gazillions of possibilities. but maybe a quite an amount of permutations can be considered equal due to the DCT transform?)
- could a partial solution be successful (e.g. ignore all MCUs with less than n bytes length)?


 Franz -- certainly a fascinating idea!

It may take me some time to get some approximate statistics for you, but in the meantime, here is what I expect we'll find: (note that these are just quick thoughts, so I may need to rethink this again in more detail!)

  • Assuming you are handling "natural" JPEG images (ie. from real-world source material), it seems highly likely to me that the number of possible MCUs could be unrealistic to pre-calculate and index.
  • We have to consider that the MCU (without subsampling) can be up to 64 entries long. To keep things simple, let's assume that the image doesn't use any ZRL in the matrices.
  • The huffman codes typically provide up to 10 bits for the value field. So, at the simplest view, an MCU could have a range in permutations defined by 64 entries of 10 bits each. Given that order matters, this is an excessively large number of permutations! (is it 6x10^191?)
  • If so, the lookup table would be immense, and the index key length would also be extremely large.
  • As for the potential file savings, I just took an example real-world image that had a quality factor of 90. With 2x1 chroma subsampling, the average MCU length worked out to approx 31 bytes per MCU.
  • Given real-world images, I expect that the chance of duplicate MCUs (even between files) could be very low, hampering the efforts to achieve the compression you're after
  • Also, unless I misunderstood, the lookup table you're defining would be specific to a single huffman table. While there are some that are commonly used, this could present a challenge when dealing with existing encoded files (you would need to transcode first).
2015-07-07Bilal Maskaleh
 I read some papers which discuss 16x16 DCT and quantization blocks instead of 8x8.
I'm trying to develop a 16x16 block JPEG encoder with a quantization table of 16x16 block.
the problem occurs when I'm trying to write the JFIF header of the file in the DQT (Define quantization table) part of the header. in my case the length is (256+3 = 259). after saving the image i cannot either read or show it on matlab or any other viewer. i used a tool to read the JFIF header, so i noticed that the 16x16 quantization table is divided into many 8x8 tables as shown in the image attached to this email.
is there any way to read the (16x16) 256 elements from JFIF header as 1 block??
 Hi Bilal --

It's not quite clear, but are you trying to encode 16x16 pixel blocks with baseline JPEG using a quantization sampling factor of H=2,V=2? (ie. the 16x16 pixel blocks typically encoded when chroma subsampling is active). If so, the quantization table should still only contain 64 elements. I may be mistaken, but I didn't think JPEG T.81 (1992) was specified to support 256 (16x16) quantization coefficients.
 Hello, I have a couple of questions :

I want to know if is it possible to detect all the DC in a JPEG image?

I don't understand : "Note that the DC component is encoded as a relative value with respect to the DC component of the previous block. The first block in the JPEG image is assumed to have a previous block value of zero."

Is the JPEG file a set of 64 pixels (block 8*8) or the size of each block depends on the number of deleted 0 ie (After the zig-zag process the 0 are removed from the block ?)
 Ignoring chroma subsampling for the moment, the JPEG image is always broken down into the 64-pixel (8x8) blocks. The number of "deleted 0s" doesn't affect the block size, but instead affects the number of AC components that are encoded in the resulting compressed bitstream. So, we take an 8x8 pixel block, perform the IDCT, and then encode the DC coefficient. The DC value (for the entire block) is encoded as a delta from the previous block's DC value. We then proceed to encode up to 63 AC coefficients by performing the zig-zag traversal through the matrix. We can then "delete 0s" by representing them with a run-length or else terminating early with an EOB. Hope that helps!
 Hello Sir, your pages are really the best source to understand JPEG in detail! Thanks.
In my implementation of JPEG Encoder, I sometimes (for some images) get 'bad huffman string' error while for some other images the compressed data becomes entirely different in comparison to the original image (For example, a red square becomes a green one!).
In my huffman tables, the EOB marker for Luma components match with one of the huffman codes for Chroma AC components. Is this problem of huffman codes because of such conflicts between huffman codes and end of block markers? Can you please provide some solution to this problem?
Thanks in advance.
 Hello Sir, Thanks for the valuable tutorial on JPEG Huffman Encoding.
I have one doubt that why standard software like MS-Paint, etc. change their huffman tables for different images. What I am trying to say is that, I took some images and saved them as .jpg files using MS-Paint. Then I examined each of those images using JPEGsnoop. The huffman tables displayed by JPEGsnoop were different for these images. Are these huffman tables static or do they change according to the content of the image (i.e. pixel values of image).
Can you please make this doubt clear and elaborate little more on Huffman coding tables.
Thanks in advance.
 Hi... I am trying to develop JPEG decoder for my project.iam using ur jpeg tool to verify at is really working fine.

But after getting the DCT values which are perfectly matched with the tool out,then I am doing level shifting and then rgb conversion.the rgb values are not matching.i am using the formula


Please help
Thanks and regards
 Hi Prashant -- you might want to see if the formula I have at the bottom of the JPEG color conversion page works better for you?
The article above is very helpful. Thank you for the tutorial. Could one have extracted the huffman tables from JPEG images using the original version of JPEGSnoop. Also, are you a professor or teacher? Feel free to send me an email. Many thanks.
 Glad you found it useful! Yes, even the very first version (v0.1) of JPEGsnoop was able to extract both quantization and huffman tables. As for being a professor, no -- just someone with an interest in how things work :)
 Hi, Thanks for this great tutorial, it is very helpful in my assignment. I am decoding a 64x64 real image, and I found this code from JPEGSnoop for the second non-zero AC coef
Codes of length 08 bits:
11111000 = 14 (Total Len = 12)
does it mean 1 zeros in in the location 2 of zigzag then 4-bit magnitude for the location 3?
2013-10-24Erkam Uzun
 Hi Calvin,
I want to ask about DRI(define restart interval) tag. When I look many of the jpegsnoop output it puts DRI tag info for some jpegs while it does not for others. And, after the DRI tag I see a length=4 and an interval=0 or 4 fields. Can you give me some details about reading these fields. In which field I must explore the length and interval value after I see a DRI tag? Second, if Jpegsnoop does not put a DRI tag info for a jepg file, does it mean there is no restart markers in the Jpeg file?

 Relatively few digicams use the "Restart Markers". It increases file size a little but does make the images more robust against errors. The length of 4 just indicates the length of the section in the file that indicates whether or not the markers exist. The interval (eg. 4) indicates how often the markers are to appear in the image data (the number of MCUs). If JPEGsnoop does not show a "Restart Interval" section in the report then there are no restart markers in the image data.
 Hi Calvin,

I am confused about constructing Huffman codeword from intermediate sequence of the symbols created with zigzag scan. More detailed, for example, how could we transform the luminance-AC zigzag scan symbol of "(zero count, size)-->(1,2)" pair to a huffman code word. And, how could we reconstruct the symbol "(1,2)" from codeword.


I want to create raw data from MCU's which decompressed from JPEG image. How I put the MCU's, How must be order of MCU's?

Thanks for these articles....
 Hi Calvin,

Thanks for the article, but I have a question. With have no chroma subsampling you say that each block needing encoding is represented with the following sequence:


What if we have subsampling? For example with [4 2 0] subsampling, for a 512x512 image we will end up with a

512x512 Y table
256x256 Cb table and
256x256 Cr table

How are we going to create the above sequence since the number of blocks of each table are not equal?

Thanks a lot.
 Good question. With chroma subsampling, the luminance components are encoded first (usually 1,2 or 4 blocks) followed by the chrominance components.

So, in your example [4 2 0], we would have:

... where Y11, Y12, Y21, Y22 represent the luminance of each16x16 block of pixels.
 Thanks a lot Calvin.
 Hi, really enlightening article although i must admit quite tough for me to digest. You say that The huffman compression tables are encoded in a somewhat confusing manner. I'd like to be able to determine if huffman tables in a non viewable jpeg file are corrupt or not as a possible reason for the viewer not to open the file properly.
Could you tell me a little bit more about how these tables are encoded in a jpg file?.
Thank you so much for sharing your work with us.
 Hello, excellent article.

Regarding the DC coding: the DC value is the difference from the previous blocks' DC value. Does this mean it is the difference from the previous block in the bit sequence, or the previous block in the image (say left-to-right)?

For example, for 4:2:0 sampling, the first four Y blocks actually span two 8x8 'rows' of the image:

Y00 Y01 Y10 Y11 Cb Cr, ...

Is the DC in each Y the difference from the one that preceded it in this sequence?

Thanks again for all the info!
 The differential encoding for the DC is done from the perspective of the previously-encoded DC block, not the visually-adjacent 8x8 block in the image.
 very nice article..may i knw wht is mean by ZRL and some explanation about it in detail
 ZRL is "zero run length" -- it represents the number of zero coefficients before the next non-zero value. This is an easy means of compressing the coefficients in the DCT matrix since it is common to have zeros in some areas (eg. high frequency coefficients).
 Hi Calvin,

I'm a programmer in one of scientific centers in Poland and I want to use a CR2 files from a Canon EOS 550D to receive datas from detectors. I have reviewed, and some articles about Huffman coding and decoding (as Your) - I've written a software which can get out a Huffman Table an properly codes (the same as in Your JPEGSnoop). But I have a problem - after 0xFFDA marker and proper length compressed data is coded.

From Huffman table there are:
00 = 0x06
010 = 0x04
011 = 0x08
100 = 0x05
101 = 0x07
1100 = 0x03
1101 = 0x09
11100 = 0x00 (that is EOB)
11101 = 0x0a
11110 = 0x02
111110 = 0x01
1111110 = 0x0c
11111110 = 0x0b
111111110 = 0x0d
1111111110 = 0x0e

The first 16 characters after 0xFFDA and length (0x0E) are:
ff 00 7f ff 00 fd fa ff 00 f7 eb ff 00 5d bf f2
So it is (in binary):
11111111 00000000 01111111 1111111 00000000 11111101 11111010 11111111
00000000 11110111 11101011 11111111 00000000 01011101 10111111 11110010
But the 0xFF00 should be changed to 0xFF, so it should be:
ff 7f ff fd fa ff f7 eb ff 5d bf f2
So it is (in binary):
11111111 01111111 1111111 11111101 11111010 11111111 11110111 11101011
11111111 01011101 10111111 11110010
And decoding should be:
First 11111111 0 is 0x0d,
but there is 1111111 1111111 111111 which doesn't match with any code from Huffman tree. What I should do?
Please write an e-mail to me so I can send You detailed data.
Best regards,
 Hi Wojtek -- Unfortunately, I haven't really spent any time decoding lossless JPEG or raw files. The analysis you have shared looks on-track, assuming that your CR2 files don't have any additional headers or extra bytes interleaved in the Huffman coded stream and presumably using the correct DHT for the first image component.

So, after the 0x0D code, I would expect that we'd find 13 bits for the value, ie. Diff[nBits] since nBits = 13. Were you trying to find a huffman code to match the Diff[nBits]? That could explain it. Following this value I think we'd proceed to the next huffman code which is 11111110, matching code 0x0B, etc.

great explanation! And yes, the ITU T81 specification goes a curious way to explain the DHT Encoding...

But you should change your diagramm of the huffman tree at "The diagram below shows the expansion of the first four rows...":

The huffman tree of a DQT is defined as "left-orientied". Only because this additional definition is made, the given amounts of leafs per level and the leaf-values are sufficient to describe a complete huffman tree. The picture shows a right-orientied tree.

Best regard from Stuttgart,

 Thank you very much VG!! It looks like I'll have to revisit the way I constructed the tree and update this. Thanks for pointing it out (it may take me a while to update with a baby taking much of my time these days! :)

First, thanks for such an awesome program and web page, I am about to finish a jpeg reader written in a bunch lines of python :-)

There is something that escapes to my understanding... using the data below... where is the 0xf0 that marks the ZRL in the last coefficient that I am pasting here?

Again thanks thanks and thanks for making my dream of decoding a jpeg me myself come true :-)

Expanded Form of Codes:
Codes of length 02 bits:
00 = 01 (Total Len = 3)
01 = 02 (Total Len = 4)
Codes of length 03 bits:
100 = 03 (Total Len = 6)
101 = 04 (Total Len = 7)
Codes of length 04 bits:
1100 = 05 (Total Len = 9)
Codes of length 05 bits:
11010 = 00 (EOB) (Total Len = 5)
11011 = 11 (Total Len = 6)
11100 = 06 (Total Len = 11)
Codes of length 06 bits:
111010 = 12 (Total Len = 8)
111011 = 13 (Total Len = 9)
Codes of length 07 bits:
1111000 = 21 (Total Len = 8)
1111001 = 31 (Total Len = 8)
1111010 = 41 (Total Len = 8)
1111011 = 61 (Total Len = 8)
Codes of length 08 bits:
11111000 = 51 (Total Len = 9)
11111001 = 22 (Total Len = 10)
11111010 = 14 (Total Len = 12)
11111011 = 07 (Total Len = 15)
Codes of length 09 bits:
111111000 = 71 (Total Len = 10)
111111001 = A1 (Total Len = 10)
111111010 = B1 (Total Len = 10)
111111011 = 32 (Total Len = 11)
111111100 = 42 (Total Len = 11)
111111101 = 82 (Total Len = 11)
111111110 = 23 (Total Len = 12)

Lum (Tbl #0), MCU=[0,0]
[0x0000015A.0]: ZRL=[ 0] Val=[ 80] Coef=[00= DC] Data=[0x 68 59 AB 46 = 0b (01101000 0------- -------- --------)]
[0x0000015B.1]: ZRL=[ 0] Val=[ 9] Coef=[01..01] Data=[0x 59 AB 46 57 = 0b (-1011001 -------- -------- --------)]
[0x0000015C.0]: ZRL=[ 0] Val=[ -10] Coef=[02..02] Data=[0x AB 46 57 EE = 0b (1010101- -------- -------- --------)]
[0x0000015C.7]: ZRL=[ 0] Val=[ -14] Coef=[03..03] Data=[0x AB 46 57 EE = 0b (-------1 010001-- -------- --------)]
[0x0000015D.6]: ZRL=[ 0] Val=[ 5] Coef=[04..04] Data=[0x 46 57 EE 4C = 0b (------10 0101---- -------- --------)]
[0x0000015E.4]: ZRL=[ 0] Val=[ 3] Coef=[05..05] Data=[0x 57 EE 4C A5 = 0b (----0111 -------- -------- --------)]
[0x0000015F.0]: ZRL=[ 1] Val=[ 4] Coef=[06..07] Data=[0x EE 4C A5 25 = 0b (11101110 0------- -------- --------)]

decoding manually 11101110 0 I get that 111011 -> 0x13 and not 0xf0!
In fact there is no 0xf0 in any table I have... And 0x13 is 19 in dec which gives me a 19bits number which is veery big

I am so close!!!! :-)
 Really great article but I have few questions. When we finish with generating Huffman tables, and Quantization tables, we have to decode the data from file itself. Lets suppose I have 8x8 color picture, and my Luminance DC and 4 AC elements are populated. Chrominance(Cb) DC and 4 AC elements are also populated, but my all Chrominance (Cr) 8x8 block is populated (i hit EOB). Consideration my Luminance and Chrominance (Cb) blocks are not populated, and my Chrominance(Cr) is populated, how do I continue with reading file? I continue with: Luminance DC, Luminance AC Chrominance(Cb) DC, Chrominance(Cb) AC, Chrominance(Cr) DC, Chrominance (Cr) AC. Or Luminance DC, Luminance AC Chrominance(Cb) DC, Chrominance(Cb) AC because my Chrominance(Cr) 8x8 block is populated?
 hi I have down loaded a huffman decoder vhdl code from opencore. it has given a test image for simulation can anyone please tell me how can i use this code to decompress other jpg image?
 Love your JPEG site and JPEGsnoop! :D

In your experience, how many encoders/applications use a *different* Huffman code set (bits, huffval) than the example specified in Annex K of the ITU T.81 standard?

How about applications that, by default, use an optimized code set, tailored to the image? (e.g. mode of IJG libjpeg where the optimized Huffman code analysis is enabled.)

 I haven't seen many digicams that use a different set of Huffman tables, but then again I haven't specifically searched the data set for it. As for applications, I believe Photoshop can produce an optimized Huffman code set, but I'm not aware of digicam encoders doing this (might be processor intensive). Unfortunately, the database I have compiled does not include Huffman code tables, so I can't easily check for this.
 problem with cert at source forge when svn'ing source for JPEGsnoop...

svn co jpegsnoop
Error validating server certificate for '':
- The certificate is not issued by a trusted authority. Use the
fingerprint to validate the certificate manually!
Certificate information:
- Hostname: *
- Valid: from Tue, 01 Feb 2011 03:25:10 GMT until Mon, 05 Mar 2012 04:22:59 GMT
- Issuer: GeoTrust, Inc., US
- Fingerprint: 94:74:b3:a9:54:ce:dc:e5:0d:d6:cf:86:b1:40:5a:48:b9:ea:15:de
(R)eject, accept (t)emporarily or accept (p)ermanently? r
svn: OPTIONS of '': Server certificate verification failed: issuer is not trusted (
please check... thank you.
 Hi there -- according to the sourceforge site, this message is often received after they have updated their certificate. Please see the "Server Certificate Verification Failed" section of the following page:
 Is there a tag somewhere that shows points to a different algorithm of decoding the huffman tree? In your 8x16 file, your algorithm works correctly. But if I look at another file as an unoptimized baseline jpeg the ORDER of the huffman codes with the SAME table are different.

For example, in the YAC table, the 3 4-bit values are ordered 17, 4, 0 in your file and then in another they are 0, 4, 17. If you follow your left to right algorithm of assigning codes to leaf nodes, it will construct 2 totally different tables if the codes are in different order.

I've noticed that a number of jpegs encode the tables in ascending order, which isn't always the correct order. So when decoding, it decodes an incorrect tree.

Is there missing logic from how you construct the tree?
 I must be making some mistake, or I am missing something important. I've attempted to decode a few 8x8 blocks with AC values by hand, but I am not getting results that make sense.

Is there a table for AC value encoding? (similar to table 5, but for AC values?). I am leaning towards this because you CAN have negative AC coefficients, as shown in the orange MCU figure.

Much thanks for your help.
 Yes, for the AC coefficient decode, you would first find the first huffman code (per AC DHT) that matches. Make sure you enable "DHT expand" to see the bitstrings. Once you have foudn the code that matches, you break down the two halves of the code into a ZRL and the number of value bits. You can then use Table 5 to determine the AC value (but ignore the ZRL portion of the code). In other words, if your AC huffman code is 0x24, then look for an entry in that table for 0x04. Hope that helps!
 I have a simple question, that I happen to be struggling with.

Lets say theres an 8x8 block that has a colour change in it, lets say half red, half green. Now there will be AC components in the Luminance AND in the Chrominance, because of the changes.

Will the decoding be as such:

Code-> use Table -> get DC value

Code (zero run/non-zero)
Code (zero run/non-zero)
.... until you reach EOB

Code-> use Table -> get DC value

Code (zero run/non-zero)
Code (zero run/non-zero)
.... until you reach EOB

Code-> use Table -> get DC value

Code (zero run/non-zero)
Code (zero run/non-zero)
.... until you reach EOB

Is this how iterations work?

Much thanks in all your work putting the data out there.
 Hi Dan -- yes, that is essentially the way that it will work (assuming that there is no chroma subsampling). The "until EOB" can actually be "or 64 components". The transition in color will translate into a low-frequency component in the DCT domain.
 Hi Calvin,

probably you might now the answer to this question:

do jpegtran produce the best Huffman encoding (read the smallest possible file size)?

The reason I'm asking this question is because PNG images usually can be further compressed, as shown here for instance:
 Sorry, I'm not familiar enough with the different PNG compression schemes to give an educated comment.
 There is any implemmentation for this issue using GPUs (OpenCL).

 Thank you for that fast response, so it's just possible to decode the part of the image between 2 restart markers?

The fastest way i've found is to use the imagemagick command line tool "stream" that can output a 1000x1000 region in 6 seconds from a 30000x30000 image.

I've also tried to use java ImageReader that can extract the first block of an image in 200ms without using more then 50mb of ram, but that's only on the top-left part of the image, when it process the bottom-right part it goes much slower, near 15 seconds.

I don't get how it works because if the image is bigger the top-left part is always done very fast.
 The easiest solution that will work for all JPEG images is to perform the huffman decode but skip the IDCT, dequantization and color conversion. Have a look at the open-source jpegtran to see how this can be done. What you have observed with ImageReader implies that it is always decoding ht entire image (always starting from top-left corner). If what you are trying to extract is top-left, it will be much faster; but if you are trying to extract the bottom-right, it will need to decode the entire image first.
 Thank you for this great article, i have to extract just a part of a jpeg, without decoding the whole image.

You think is this possible just taking a part of the file and decode that?

Thank you!
 In general, no, it is not possible to decode just part of a file to decode (unless restart markers are inserted in the stream).
 Thanks for this article! It makes JPEG a lot easier to understand! :-)

On remark though: I had the impression Table 1 was complete, but, I think it is missing one entry:
8 bits 11111 110 0B
 Thanks Sathananthan... can you clarify the entry for me? I currently have (in Table 1):
7 bits 	1111 110 	0A
8 bits 	1111 1110 	0B
What you have written "8 bits 11111 110 0B" appears to be the same as my last row. thx.
2011-02-02Claudio Freitas
 Hi Calvin,

I think in the Table 2, the bits for the codes 21 and 12 are shifted. Just to let you know.

Also, thank you very much for such informative tutorial!
Claudio Freitas
 Hello Claudio -- thanks for pointing out the possible error. However, I re-ran JPEGsnoop DHT expansion on the sample image and I still get the same results for codes 21 and 12 in Table 2:
    Codes of length 05 bits:
      11010 = 05                         (Total Len = 10)
      11011 = 21                         (Total Len =  6)
      11100 = 12                         (Total Len =  7)
Can you clarify what values you were expecting? Thanks.
 Very helpful tutorial, thanks lots :)
Any ideas on arithmetic encoding the dc and ac values?
 Excellent write up. I knew (when decoding an image) the Huffman Table built a B tree but I couldn't find how we assigned values to the leaf, of course now it seems obvious.

Thank you.
I know you must be busy answering JPEG questions all the time, so here's another one. I thought I had a firm grasp on Huffman coding, but apparently not. I noticed when I ran JPEG snoop that some of the Huffman bit lengths and values don't match up! Below is an example of what I'm talking about. How can "11" be a 4 bit value? 4 bit values can only pertain to the 2nd hex value in the string. "11" denotes 10001 which is 5 bits. Right? My confidence in understanding Huffman coding has been diminished. Care to clarify? I would appreciate any feedback! Thanks.
Codes of length 04 bits (003 total): 00 04 11
 Hi Nick -- What you are actually looking at is the list of Huffman Codes, not the Huffman bit-strings. To see the actual bit string that represents code "00", "04" and "11", enable "DHT Expand" mode under Options. This will then show you that these 3 Huffman codes each are represented by 4-bit binary strings. Hope that helps!
 Great tutorial !

I am writing decoder for jpeg and your tutorial helped me a lot. I managed to decode your 16x8 picture and some other color pictures.
But I have a problem with one 40x20 color jpeg, I wonder if you could help me to read it step by step.
 Think you might find this interesting -

Wonderful pages, by the way!
 Thanks... Regarding that sample image (, I don't believe the differences have anything to do with color space since the image is untagged (does not have any such markers). Instead, I believe it could be due to one of two things: 1) Bug in JPEG decoder within some browsers with the way that they handle the implicit EOB (End of Block) when a MCU has defined all 64 coefficients, rather than encoding the EOB code. I notice that the first two MCUs have luminance coefficients that consume all 64 entries, which is relatively uncommon. 2) Color conversion clips differently. This would seem to be a very likely reason except that I didn't detect any clipping when converting from YCC to RGB.
 Hi Calvin,

If all the AC component are non-zero. Does it needs the EOB code (the last AC value is non-zero)?
 If all 63 coefficients are non-zero, then you don't need to have an EOB code -- it becomes implicit. In JPEGsnoop, I report this as an "EOB64" instead of an "EOB".
 Hi Calvin, I have read your tutorial for JPEG huffman coding. I have two quick questions for you

I will appreciate it very much if you can answer it.

(1) In the Huffman coding, is the DC value is dependent or independent between adjacent 8 by 8 blocks?

(2) When there are 1 or more zeros in the AC value between 2 non-zeros values, how does the entropy coding encode the 0s with the following non-zero value?

thanks a lot
 1) The DC value is dependent upon the previous 8x8 block's DC value because it is based on an incremental adjustment from the previous block.
2) When there are 1 or more zeros in the AC value between 2 non-zero values, the encoder uses the "zero run length" nibble of the Huffman Code to indicate the number of zero coefficients.

i'm doing well untill the dequantization stage and then after the idct and color conversion process i dont get the right RGB values like the RGB values shown in jpegsnoop. i think i miss something here.

i do the idct process by multipying the dequantized values with the idct matrix element by element and then i do the level shift and convert it to RGB values using your formula.

can you give me some hints??? thanks
 i am now working on my project to make a jpeg decoder using c. can you tell me how to generate the binary tree for the huffman decoding stage??


Leave a comment or suggestion for this page:

(Never Shown - Optional)