Return to Digital Photography Articles

JPEG Huffman Coding Tutorial

After quantization, huffman / entropy coding is one of the more significant contributors to the file size savings in JPEG compression. This page provides a tutorial on how the huffman coding works in a JPEG image. If you have ever wondered how JPEG compression works, this may provide you with some detailed insight.

Why I wrote this tutorial

In attempting to understand the inner workings of JPEG compression, I was unable to find any real details on the net for how Huffman coding is used in the context of JPEG image compression. There are many sites that describe the generic huffman coding scheme, but none that describe how it will appear in a JPEG image, after factoring in the DHT tables, interleaved chroma subsampling components, etc. While it is relatively easy to understand the JPEG marker extraction, the Start of Scan data segment is the least understood and most important part of the image data. Therefore, I decided to create a page to walk through a decompression example. Hopefully others will find this useful!

The relevant sections in the JPEG Standard are quite obscure -- enough so that I set out to analyze several JPEG images to reverse-engineer how the huffman coding was being applied in a JPEG JFIF file.

Latest Update:

[12/03/2007]: Corrected typo in text near Table 5 (code 00101). Added JPEGsnoop output (at end of Tutorial).
[01/27/2007]: Added section describing how to expand DHT into bit strings.

The Goal

The goal of this tutorial is to take a simple JPEG image and try to decode the compressed image data by hand, learning how the Huffman compression scheme works in the process.

Simplest JPEG Example

Most digital photos are full-color natural/organic images, which means that all three image components (one luminance and two color channels) will all have both low and high-frequency content. In addition, nearly all digital photos use chroma subsampling, which makes the extraction process a little more complicated. For the purposes of showing the basic huffman extraction, we will start with the simplest of all JPEG images:

  • Grayscale - no content in the two color channels
  • Solid color in each MCU - By making all pixels in an 8x8 block the same color, there will be no AC components.
  • No chroma subsampling - Makes scan data extraction simpler: Y, Cb, Cr, Y, Cb, Cr, etc.
  • Small Image - Total image size is 16x8 = two MCUs or blocks. This makes the extraction in this tutorial shorter.

Creating the Image

For the purposes of this tutorial, my working image will simply be a 16x8 pixel image, with two solid color blocks: one black and the other white. Note that each block is 8x8 pixels in size. The actual image is here: . If you want to download it, right-click and select Save Picture As...

Creating the sample image was trivial, working at 1600% view. Important that dimensions and any changes in the content are on 8-pixel boundaries. Overall image dimensions should be a multiple of 8 pixels as well, in both directions. The image below is a magnified version with a grid overlayed.

Once the image was created, it was saved with Photoshop CS2's Save for Web... command. This kept the file size down as it discards other extraneous file information (metadata, etc.) that is not relevant to this tutorial. Some other important points:

  • Use Save for Web - Reduces total file content to minimal subset.
  • Use Quality level 51+ - This ensures that there is no chroma subsampling enabled in the JPEG encoding process, according to the way that Photoshop Save for Web operates. I used quality 80 for this example.
  • Turn Optimized Off - For the purposes of this example, I think it is important to work with realistic huffman tables, not degenerate single-entry tables. Therefore I recommend that JPEG Huffman Table Optimization is left off.
  • Other settings: Blur off, Progressive off, ICC profile off.

Grayscale Photoshop Images

It should be noted that when you save a JPEG image from within Photoshop it always contains three components (Y, Cb, Cr). If you change the mode to grayscale (via Mode->Grayscale), the three components are still saved, even though the JPEG standard supports an image with only one component (which would be assumed to be grayscale).

What is Huffman Coding / Entropy Coding?

Huffman coding is a method that takes symbols (e.g. bytes, DCT coefficients, etc.) and encodes them with variable length codes that are assigned according to statistical probabilities. A frequently-used symbol will be encoded with a code that takes up only a couple bits, while symbols that are rarely used are represented by symbols that take more bits to encode.

A JPEG file contains up to 4 huffman tables that define the mapping between these variable-length codes (which take between 1 and 16 bits) and the code values (which is an 8-bit byte). Creating these tables generally involves counting how frequently each symbol (DCT code word) appears in an image, and allocating the bit strings accordingly. But, most JPEG encoders simply use the huffman tables presented in the JPEG standard. Some encoders allow one to optimize these tables, which means that an optimal binary tree is created which allows a more efficient huffman table to be generated.

For a reasonable explanation of how it works, please see this example of Huffman coding an ASCII string and the overview from Wikipedia.

For more details, please see my article on Optimized JPEGs - optimizing the huffman tables, particularly the first introductory sections and the section near the end titled "Standard Huffman Tables".

Decoding the JPEG Scan Data

Using JPEGsnoop

For those who are trying to understand the complex huffman decoding in a JPEG image, I'm happy to report that JPEGsnoop can now report all of the variable length code decoding for each MCU (use the Detailed Decode option). For the sample output, scroll to the bottom of this tutorial.

Decoding by Hand

The following is the decode method done by hand, which is obviously impractical for most images, but is shown here in detail to help one learn the process involved.

 

The above hex dump datastream shows the beginning of the Start of Scan (SOS marker 0xFFDA) marked in yellow, followed by some additional details in green and then the actual scan data selected in dark blue. Finally, the image is terminated with an End of Image (EOI marker 0xFFD9). So, the huffman-coded data content is only 9 bytes long.

Comparison of Compression File Sizes

For the sake of comparison, the original image (16 pixels by 8 pixels) contains a total of 128 pixels (2 MCUs). With 8 bits per channel (RGB), this corresponds to an uncompressed image size of 384 bytes (128 pixels x 8 bits/channel x 3 channels x 1 byte/8 bits). Clearly, using a run-length encoded format such as GIF would have produced even more image compression in examples like this (although GIF actually takes 22 bytes to code the stream because there are 16 separate runs). JPEG is not really designed to be optimized for this type of synthetic (non-organic) image.

If one uses optimized JPEG encoding, it is possible to reduce the image content size even further. In the example image, the optimized version has much smaller huffman tables (DHT) and shorter bitstrings to represent the same codewords. The net effect is that the image content size is reduced even further (to 7 bytes).

File FormatTotal Size Overhead Size Image Content Size
BMP (Uncompressed) 440 Bytes 56 Bytes 384 Bytes
JPEG653 Bytes 644 Bytes 9 Bytes
JPEG (Optimized) 304 Bytes 297 Bytes 7 Bytes
GIF60 Bytes 38 Bytes 22 Bytes

Scan Data Decode

The scan data is:

FC FF 00 E2 AF EF F3 15 7F

To help resiliency in the case of data corruption, the JPEG standard allows JPEG markers to appear in the huffman-coded scan data segment. Therefore, a JPEG decoder must watch out for any marker (as indicated by the 0xFF byte, followed by a non-zero byte). If the huffman coding scheme needed to write a 0xFF byte, then it writes a 0xFF followed by a 0x00 -- a process known as adding a stuff byte.

For our extraction purposes, we will replaceme any padding bytes (0xFF00 with 0xFF):

FC FF E2 AF EF F3 15 7F

The expectation is that image content is 3 components (Y, Cb, Cr). Within each component, the sequence is always one DC value followed by 63 AC values.

For each MCU, with no chroma subsampling, we would expect the following data to be encoded:

Section123456
ComponentYCbCr
AC / DCDCACDCACDCAC

Note that some people get the order of the chrominance channels mixed up, and assume that it is YCrCb instead.

The figure below shows what the DCT matrix from a single MCU (8x8 pixel square) in a digital photo typically looks like. These are the entries after quantization, which has caused many of the higher-frequency components (towards the bottom-right corner of the matrix) to become zero. By the distribution of values in the frequency-domain matrix representation, it is possible to determine that the 8x8 pixel square had very little high-frequency content (i.e. it had only a gradual intensity / color change).

The DC component represents the average value of all pixels in the 8x8 MCU. Since we have deliberately created an image where all pixels in the 8x8 block are the same, we expect this value to represent either the black or white "color". The code provided in the DC entry (#0) indicates a huffman-encoded size (e.g. 1-10 bits) which is the number of bits needed to represent the average value for the MCU (eg. -511...+511).

Note that the DC component is encoded as a relative value with respect to the DC component of the previous block. The first block in the JPEG image is assumed to have a previous block value of zero.

Following the single DC component entry, one or more entries are used to describe the remaining 63 entries in the MCU. These entries (1..63) represent the low and high-frequency AC coefficients after DCT transformation and quantization. The earlier entries represent low-frequency content, while the later entries represent high-frequency image content. Since the JPEG compression algorithm uses quantization to reduce many of these high-frequency values to zero, one typically has a number of non-zero entries in the earlier coefficients and a long run of zero coefficients to the end of the matrix.

For the purposes of this tutorial, I have deliberately created an image that has constant color across all 8x8 pixels in each of the two MCU. Because there are no changes in value across each 8x8 pixel region, there is no AC (or higher frequency content) within the block. As a result, all 63 entries in the AC portion are expected to be zero (unlike the figure above). This allows us to focus on the DC component, which we do expect to change from MCU to MCU block.

The hex string shown earlier (after removal of padding bytes) can be represented in binary as the following:

1111 1100 1111 1111 1110 0010 1010 1111 1110 1111 1111 0011 0001 0101 0111 1111

Extract Huffman Code Tables

Using a utility such as JPEGsnoop, you can extract the Huffman tables from the JPEG image file. Often, you will find four huffman table entries (tagged with a DHT marker):

  • DHT Class=0 ID=0 - Used for DC component of Luminance (Y)
  • DHT Class=1 ID=0 - Used for AC component of Luminance (Y)
  • DHT Class=0 ID=1 - Used for DC component of Chrominance (Cb & Cr)
  • DHT Class=1 ID=1 - Used for AC component of Chrominance (Cb & Cr)

The huffman compression tables are encoded in a somewhat confusing manner. Although you can draw out the binary tree by hand, it will be easier if you rely on a tool such as JPEGsnoop to generate all of the binary comparison strings for each huffman code in all four DHT sections.

The following four tables were extracted from the JPEG file that was created by Photoshop for the purposes of this tutorial. Other JPEG images may be reliant on different DHT tables, so it is important to extract them prior to analyzing the file. Note that turning on JPEG Optimization will create vastly different Huffman tables, with far fewer entries. For a point of comparison, the image described in this tutorial would only need optimized huffman tables of one entry each to represent our image content.

NOTE: It is important to realize that in each case the DHT entries in the JPEG file only list the Length and Code values, not the actual Bit String mapping. It is up to you to rebuild the binary tree representation of the DHT table to derive the bit strings! Please see the DHT Expansion section near the end of this tutorial for more details.

Table 1 - Huffman - Luminance (Y) - DC

LengthBitsCode
3 bits 000
001
010
011
100
101
110
04
05
03
02
06
01
00 (End of Block)
4 bits 1110 07
5 bits 1111 0 08
6 bits 1111 10 09
7 bits 1111 110 0A

Table 2 - Huffman - Luminance (Y) - AC

LengthBitsCode
2 bits 00
01
01
02
3 bits 100 03
4 bits 1010
1011
1100
11
04
00 (End of Block)
5 bits 1101 0
1101 1
1110 0
05
21
12
6 bits 1110 10
1110 11
31
41
... ... ...
12 bits ...
1111 1111 0011
...
...
F0 (ZRL)
...
... ... ...
16 bits ...
1111 1111 1111 1110
...
FA

Table 3 - Huffman - Chrominance (Cb & Cr) - DC

LengthBitsCode
2 bits 00
01
01
00 (End of Block)
3 bits 100
101
02
03
4 bits 1100
1101
1110
04
05
06
5 bits 1111 0 07
6 bits 1111 10 08
7 bits 1111 110 09
8 bits 1111 1110 0A
9 bits 1111 1111 0 0B

Table 4 - Huffman - Chrominance (Cb & Cr) - AC

LengthBitsCode
2 bits 00
01
01
00 (End of Block)
3 bits 100
101
02
11
4 bits 1100 03
5 bits

1101 0
1101 1

04
21
6 bits 1110 00
1110 01
1110 10
12
31
41
... ... ...
9 bits ...
1111 1100 0
...
...
F0 (ZRL)
...
... ... ...
16 bits ...
1111 1111 1111 1110
...
FA

 

Table 5 - Huffman DC Value Encoding

The following table shows how the bit fields that follow a DC entry can be converted into their signed decimal equivalent. To use this table, start with the DC code value and then extract "Size" number of bits after the code. These "Additional Bits" will represent a signed "DC Value" which becomes the DC value for that block. Note that this table applies to any JPEG file -- this table is not written anywhere in the JPEG file itself.

For example, let's assume that one was about to decompress a chrominance DC entry. If the previously-decoded "DC Code" was 05, then we must extract 5 bits following the code bits. If the next 5 bits were 00101, then this can be interpreted as decimal -26. The bits 10001 would be +17 and 11110 would be +30.

DC Code Size Additional Bits DC Value
01 1 01 -11
02 2 00,0110,11 -3,-22,3
03 3 000,001,010,011100,101,110,111 -7,-6,-5,-44,5,6,7
04 4 0000,...,01111000,...,1111 -15,...,-88,...,15
05 5 0 0000,... ...,1 1111 -31,...,-16 16,...,31
06 6 00 0000,... ...,11 1111 -63,...,-32 32,...,63
07 7 000 0000,... ...,111 1111 -127,...,-64 64,...,127
08 8 0000 0000,... ...,1111 1111 -255,...,-128 128,...,255
09 9 0 0000 0000,... ...,1 1111 1111 -511,...,-256 256,...,511
0A 10 00 0000 0000,... ...,11 1111 1111 -1023,...,-512 512,...,1023
0B 11 000 0000 0000,... ...,111 1111 1111 -2047,...,-1024 1024,...,2047

Block 1 - Luminance

Luminance (Y) - DC

Referring to the Y(DC) table (Table 1), we start with the first few bits of the coded stream (1111 1100 1111...) and recognize that code x0A matches the bit string 1111 110.

1111 1100 1111 1111 1110 0010 1010 1111 1110 1111 1111 0011 0001 0101 0111 1111
=> Code: 0A

This code implies that hex A (10) additional bits follow to represent the signed value of the DC component. The next ten bits after this code is 0 1111 1111 1. Table 5 above shows the DC values represented by these "additional bits" -- in this case, the bit string corresponds to a value of -512.

1111 1100 1111 1111 1110 0010 1010 1111 1110 1111 1111 0011 0001 0101 0111 1111
=> Value: -512

Our progress so far:

Bits1111 1100 1111 1111 1 110 0010 1010 1111 1110 1111 1111 0011 0001 0101 0111 1111
MCU1???
ComponentY ???
AC/DC DC ???
Value -512 ???

Luminance (Y) - AC

After the DC component, we begin the 63-entry AC matrix for the Y Luminance. This uses a different Huffman table (Table 2).

1111 1100 1111 1111 1110 0010 1010 1111 1110 1111 1111 0011 0001 0101 0111 1111
=> Code: 00 (EOB)

In the above huffman code table, the code 1100 corresponds to an EOB (End of Block). Therefore, the AC components was cut short early (no other codes). This means that all 63 entries of the matrix (all entries except the 1st entry, which is the DC component) are zeros. Since we have finished the luminance component, we then move on to the chrominance components (Cb and Cr).

Bits1111 1100 1111 1111 1 1100010 1010 1111 1110 1111 1111 0011 0001 0101 0111 1111
MCU1???
ComponentY ???
AC/DC DC AC ???
Value -512 0???

Block 1 - Chrominance

Chrominance (Cb) - DC

1111 1100 1111 1111 1110 0010 1010 1111 1110 1111 1111 0011 0001 0101 0111 1111
=> Code: 00 (EOB)

End of chrominance DC, start on AC.

Chrominance (Cb) - AC

1111 1100 1111 1111 1110 0010 1010 1111 1110 1111 1111 0011 0001 0101 0111 1111
=> Code: 00 (EOB)

Again, the AC is terminated right away. Now, we move on to the second chrominance channel, Cr.

Bits1111 1100 1111 1111 1 11000101010 1111 1110 1111 1111 0011 0001 0101 0111 1111
MCU1???
Component Y Cb ???
AC/DC DC AC DC AC ???
Value -512 000???

Chrominance (Cr) - DC

Refer to Table 3 for the relevant Huffman codes.

1111 1100 1111 1111 1110 0010 1010 1111 1110 1111 1111 0011 0001 0101 0111 1111
=> Code: 00 (EOB)

This marks the end of the DC.

Chrominance (Cr) - AC

Refer to Table 4 for the relevant Huffman codes.

1111 1100 1111 1111 1110 0010 1010 1111 1110 1111 1111 0011 0001 0101 0111 1111
=> Code: 00 (EOB)

This marks the end of the AC.

Bits1111 1100 1111 1111 1 110001 0101 01111 1110 1111 1111 0011 0001 0101 0111 1111
MCU1???
ComponentY Cb Cr ???
AC / DC DC AC DC AC DC AC ???
Value -51200000???

Block 2 - Luminance

Luminance (Y) - DC

1111 1100 1111 1111 1110 0010 1010 1111 1110 1111 1111 0011 0001 0101 0111 1111
=> CODE: 0A

This code indicates that the value is stored in the next ten bits (A in hex is 10 in decimal):

1111 1100 1111 1111 1110 0010 1010 1111 1110 1111 1111 0011 0001 0101 0111 1111
=> Value: +1020

Bits1111 1100 1111 1111 1 110001 0101 01111 1110 1111 1111 0011 0001 0101 0111 1111
MCU12???
ComponentY Cb Cr Y???
AC / DC DC AC DC AC DC AC DC ???
Value -512 00000 +1020 ???

Luminance (Y) - AC

1111 1100 1111 1111 1110 0010 1010 1111 1110 1111 1111 0011 0001 0101 0111 1111
=> CODE: EOB

The end-of-block indicator means that all remaining values are zero. Since we haven't even started with the first value, all 63 values can be interpreted as zero. This means that there is no non-DC image content, which is to be expected since all 64 pixels (8x8) in the block are the same color.

Bits1111 1100 1111 1111 1 110001 0101 01111 1110 1111 1111 00 1100 01 0101 0111 1111
MCU12???
ComponentY Cb Cr Y???
AC / DC DC AC DC AC DC AC DC AC ???
Value -512 00000 +1020 0???

Block 2 - Chrominance

Chrominance (Cb) - DC

1111 1100 1111 1111 1110 0010 1010 1111 1110 1111 1111 0011 0001 0101 0111 1111
=> CODE: 00 (EOB)

Chrominance (Cb) - AC

1111 1100 1111 1111 1110 0010 1010 1111 1110 1111 1111 0011 0001 0101 0111 1111
=> CODE: 00 (EOB)

Chrominance (Cr) - DC

1111 1100 1111 1111 1110 0010 1010 1111 1110 1111 1111 0011 0001 0101 0111 1111
=> CODE: 00 (EOB)

Chrominance (Cr) - AC

1111 1100 1111 1111 1110 0010 1010 1111 1110 1111 1111 0011 0001 0101 0111 1111
=> CODE: 00 (EOB)

Bits1111 1100 1111 1111 1 110001 0101 01111 1110 1111 1111 00 1100 0101010111 1111
MCU12X
ComponentY Cb Cr Y Cb Cr X
AC / DC DC AC DC AC DC AC DC AC DC AC DC AC X
Value -512 00000 +1020 00000X

Remainder

Because the scan data must end on a byte boundary, there may be a remainder of bits that will simply be tossed. In this case we see 6 bits (all-1's) that will be discarded.

1111 1100 1111 1111 1110 0010 1010 1111 1110 1111 1111 0011 0001 0101 0111 1111

Conversion to Spatial Domain

Now that we have the DCT (Discrete Cosine Transform) values, we can try to determine what the content of the image was in the spatial domain. Remember that the DC & AC values that we decompressed from the huffman coding are the frequency-domain representation of the image, not the real YCbCr values.

Given that all of the values in the Cb and Cr channels (chrominance) are zero, we can assume that the image is grayscale and instaed focus on the luminance values (brightness).

Second, all of the AC component values are zero, which means that there is no higher frequency content in the images -- in fact, we can determine that all of the image data in each 8x8 block is of the same color & intensity (i.e. only the DC component remains).

We can determine the following:

BlockEncoded (Relative) DC Value
1Y = -512
2 Y = +1020

Relative DC to Absolute DC

Note that the DC components are encoded as a difference from the previous block (with the first block assumed to start relative to zero). Therefore, we know that block 1 has a DC value of -512, but that block 2 has a DC value of +1020 relative to block 1. So, we now know:

BlockActual (Absolute) DC Value
1Y = -512
2 Y = +508

DCT to RGB

Finally, we want to convert the DCT DC value to an RGB value. Assuming that the gain of the DCT transform is 4, we divide the values by 4 to get block1 = -128, block2 = +127.

Now, we have to convert between YCbCr and RGB. Please refer to the formulae provided on the JPEG color conversion page.

Color conversion yields RGB values of:

BlockRGB Value (hex)RGB Value (decimal)
1(0x00,0x00,0x00)(0,0,0)
2 (0xFF,0xFF,0xFF) (255,255,255)

... and these are the original values that were used to create the JPEG image!

Expansion of DHT Table into binary bit strings

Note that the DHT tables that appear in the JPEG file only define the number of codes for each bitstring length, followed by a sequence of all the code words. The bit strings that are used to represent each of these code words is not included! It is the job of the JPEG decoder to derive these bit strings from the little information that is provided.

There are many methods in which the generation of the binary strings is performed in a decoder, and most of them are heavily optimized for performance (e.g. the djpeg huffman routines). These implementations are quite difficult to learn from. I find it much more instructive to generate these bit sequences by drawing out the huffman binary tree by hand.

In an extreme simplification, binary trees are composed of a root node which has a left and right branch to other nodes. Each of these nodes can either be a leaf node (the end of a branch) or split further into two more nodes.

The actual DHT in the JPEG file:

The following was extracted directly from the DHT (Define Huffman Table, shown above in Table 2: Luminance AC) in the JPEG file using JPEGsnoop.

  Class = 1 (AC Table)
  Destination ID = 0
    Codes of length 01 bits (000 total): 
    Codes of length 02 bits (002 total): 01 02 
    Codes of length 03 bits (001 total): 03 
    Codes of length 04 bits (003 total): 11 04 00 
    Codes of length 05 bits (003 total): 05 21 12 
    Codes of length 06 bits (002 total): 31 41 
    Codes of length 07 bits (004 total): 51 06 13 61 
    Codes of length 08 bits (002 total): 22 71 
    Codes of length 09 bits (006 total): 81 14 32 91 A1 07 
    Codes of length 10 bits (007 total): 15 B1 42 23 C1 52 D1 
    Codes of length 11 bits (003 total): E1 33 16 
    Codes of length 12 bits (004 total): 62 F0 24 72 
    Codes of length 13 bits (002 total): 82 F1 
    Codes of length 14 bits (006 total): 25 43 34 53 92 A2 
    Codes of length 15 bits (002 total): B2 63 
    Codes of length 16 bits (115 total): 73 C2 35 44 27 93 A3 B3 36 17 54 64 74 C3 D2 E2 
                                         08 26 83 09 0A 18 19 84 94 45 46 A4 B4 56 D3 55 
                                         28 1A F2 E3 F3 C4 D4 E4 F4 65 75 85 95 A5 B5 C5 
                                         D5 E5 F5 66 76 86 96 A6 B6 C6 D6 E6 F6 37 47 57 
                                         67 77 87 97 A7 B7 C7 D7 E7 F7 38 48 58 68 78 88 
                                         98 A8 B8 C8 D8 E8 F8 29 39 49 59 69 79 89 99 A9 
                                         B9 C9 D9 E9 F9 2A 3A 4A 5A 6A 7A 8A 9A AA BA CA 
                                         DA EA FA 
    Total number of codes: 162

So, how do we create the binary bit-strings for each of these codes?

Well, we start to build a binary tree, creating new branches and putting the code words into leaf nodes of the tree. Row 1 of the tree contains code words that only require 1 bit to encode, row 2 contains code words (leaf nodes) that only require 2 bits to encode and so on.

We first start with row 0:

  • Row 0 (the root node) is almost always a parent node, creating a left and a right branch down to the next row. We label the left branch 0 and the right branch 1.
  • At row 1, we try to fill in any of the nodes (there are 2 at this level) with code words that take 1 bit to encode. We see from the DHT that there are no codes of length 1 bit. So, for each of the two nodes we spawn off a left and right node below (on row 2), creating a total of 4 nodes. Again, for these branches, we label the left one 0 and the right one 1.
  • At row 2, we will try to fill in any codes of length 2 bits, starting from left to right. This time we see in the DHT that there are two codes that can be encoded with bit strings of length 2. So, we take the first code value x01 (hex) and place this in the first node, making it a leaf node (no further branches will come from it). We then take the second code value x02 and place this in the second node, making it too a leaf node. At this point we have two more nodes left in this row of the tree but no more codewords listed in the DHT for this bitstring length. Therefore, we create two branches for each of the remaining nodes of this row. Since two nodes are left, this will create a total of 4 branches, meaning that there will be again 4 nodes in the third row of the binary tree.
  • At row 3, we have four nodes available, but the DHT indicates only one codeword uses 3 bits to encode. So, we place x03 at the leftmost of these nodes, and we create branches for each of the remaining three nodes (making 6 nodes for row 4).
  • At row 4, we have six nodes available, but the DHT indicates only three codewords use 4 bits to encode. This time we terminate three nodes (making them leaf nodes) and we further extend the other three down to row 5.

This process continues until we have used all of the codewords defined in the DHT table. The diagram below shows the expansion of the first four rows of the above DHT.

Binary Tree Expansion of DHT

Once the binary tree has been completed, you can read off the bit strings for each code word by combining the bit labels of each branch on the path down from the root node. For example, code word 0x04 is encoded by the binary bit string 1011 because we need to take branch 1 from the root node, 0 from the node on row 1, 1 from the node on row 2 and branch 1 from the node on row 3.

This process can be quite laborious, and the binary tree often takes on a shape that would be difficult to draw through to completion. To make this easier, I have added a feature, DHT Expand, to JPEGsnoop that determines the binary bit strings for each of the code words appearing in the DHT table.

Below is the expansion of the first few rows of the table, using JPEGsnoop with the DHT Expand mode enabled.

  Expanded Form of Codes:
    Codes of length 02 bits:
      00 = 01
      01 = 02
    Codes of length 03 bits:
      100 = 03
    Codes of length 04 bits:
      1010 = 11
      1011 = 04
      1100 = 00 (EOB)
    Codes of length 05 bits:
      11010 = 05
      11011 = 21
      11100 = 12
    Codes of length 06 bits:
      111010 = 31
      111011 = 41
    Codes of length 07 bits:
      1111000 = 51
	   ...
    Codes of length 15 bits:
      111111111000100 = B2
      111111111000101 = 63
    Codes of length 16 bits:
      1111111110001100 = 73
      1111111110001101 = C2
	   ...
      1111111111111100 = DA
      1111111111111101 = EA
      1111111111111110 = FA

Implementations in real JPEG decoders optimize this mechanism heavily as the bitstring search / parsing process is not a trivial task in processors designed to work with 32-bit words or bytes. Most of them end up creating large lookup tables that define lower and upper bounds for a match of a particular bitstring / code.

More Realistic JPEG Images

Obviously the above is an extremely simple example JPEG image. However, real images will have some other characteristics that you will encounter:

  • Chroma subsampling - You can expect that most photos will have 2x1 chroma subsampling, which will mean that the decoding sequence per MCU will be Y1 Y2 Cb Cr instead of Y Cb Cr.
  • AC components - You most certainly will have non-zero AC coefficients, unlike the above. In such a case, you will generally have a few non-zero values, which will eventually be terminated with a ZRL (code word 0xF0) which indicates a run of 16 zeros and then probably a EOB (code word 0x00).

Some more detail regarding Huffman coding with chroma subsampling and other factors are described in the JPEG decoder page.

JPEGsnoop Detailed Decode Output

Running JPEGsnoop on the image shown above, with the Scan Segment->Detailed Decode option enabled, the following output is shown:

*** Decoding SCAN Data ***
  OFFSET: 0x00000282
  Scan Decode Mode: Full IDCT (AC + DC)


    Lum (Tbl #0), MCU=[0,0]
      [0x00000282.0]: ZRL=[ 0] Val=[ -512] Coef=[00= DC] Data=[0x FC FF 00 E2 = 0b (11111100 11111111 1------- --------)] 
      [0x00000285.1]: ZRL=[ 0] Val=[    0] Coef=[01..01] Data=[0x E2 AF EF F3 = 0b (-1100--- -------- -------- --------)] EOB
                      DCT Matrix=[-1024     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]

    Chr(0) (Tbl #1), MCU=[0,0]
      [0x00000285.5]: ZRL=[ 0] Val=[    0] Coef=[00= DC] Data=[0x E2 AF EF F3 = 0b (-----01- -------- -------- --------)] EOB
      [0x00000285.7]: ZRL=[ 0] Val=[    0] Coef=[01..01] Data=[0x E2 AF EF F3 = 0b (-------0 1------- -------- --------)] EOB
                      DCT Matrix=[    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]

    Chr(0) (Tbl #1), MCU=[0,0]
      [0x00000286.1]: ZRL=[ 0] Val=[    0] Coef=[00= DC] Data=[0x AF EF F3 15 = 0b (-01----- -------- -------- --------)] EOB
      [0x00000286.3]: ZRL=[ 0] Val=[    0] Coef=[01..01] Data=[0x AF EF F3 15 = 0b (---01--- -------- -------- --------)] EOB
                      DCT Matrix=[    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]


    Lum (Tbl #0), MCU=[1,0]
      [0x00000286.5]: ZRL=[ 0] Val=[ 1020] Coef=[00= DC] Data=[0x AF EF F3 15 = 0b (-----111 11101111 111100-- --------)] 
      [0x00000288.6]: ZRL=[ 0] Val=[    0] Coef=[01..01] Data=[0x F3 15 7F FF = 0b (------11 00------ -------- --------)] EOB
                      DCT Matrix=[ 2040     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]

    Chr(0) (Tbl #1), MCU=[1,0]
      [0x00000289.2]: ZRL=[ 0] Val=[    0] Coef=[00= DC] Data=[0x 15 7F FF D9 = 0b (--01---- -------- -------- --------)] EOB
      [0x00000289.4]: ZRL=[ 0] Val=[    0] Coef=[01..01] Data=[0x 15 7F FF D9 = 0b (----01-- -------- -------- --------)] EOB
                      DCT Matrix=[    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]

    Chr(0) (Tbl #1), MCU=[1,0]
      [0x00000289.6]: ZRL=[ 0] Val=[    0] Coef=[00= DC] Data=[0x 15 7F FF D9 = 0b (------01 -------- -------- --------)] EOB
      [0x0000028A.0]: ZRL=[ 0] Val=[    0] Coef=[01..01] Data=[0x 7F FF D9 00 = 0b (01------ -------- -------- --------)] EOB
                      DCT Matrix=[    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
                                 [    0     0     0     0     0     0     0     0]
				
				

 


Reader's Comments:

Please leave your comments or suggestions below!
2008-03-12Pavan
 I am simply grateful for this tut..
 Thanks!
2008-03-07Giri
 if mcu have 3 component y1,cb1,cr1....yn that is called as scan then what is ECS i m confused of that frame marker many thks if u reply in pictorial form (SCAN< ECS<MCU<RST) ? , giri
 Giri -- The 3 components make up 1 MCU. There are a large number of MCUs per ECS (Entropy-Coded Segment). If there are no restart markers, then the entire image ends up being a single ECS. Otherwise, there are a predefined number of MCUs in each ECS, and a Restart marker (RST) appears between each ECS.

The sequence of one or more ECS makes up a single SCAN. Most images only have a single scan in the FRAME. The best diagram reference for this is in Figure B.2 "Syntax for sequential DCT-based, progressive DCT-based, and lossless modes of operation" from the ITU-T.81 / ISO 10918-1 (dated 1993).

Hope that helps... if you want any clarification, let me know.
2008-01-28Laurent Bonetto
 Hi,
I had to write a JPEG encoder from scratch for my company and I just wanted to tell you how useful your site was, both in terms of the explanation you provided, and for the jpeg-snoop utility that allowed me to spot very quickly a number of mistakes I had in my encoded bit string.

It just took me about 15 days and I don't think I could have done it that fast without these precious resources. I sent you a small donation to thank you for what you provided.

As an aside, I also love photography and diving (I am an adept of under-water hockey -- do you know about it?) so it looks like we have a few things in common apart from making sense of bit strings of zeros and ones!
Laurent
 Hey Laurent, thanks very much for the kind words and contribution! I'm glad to hear that it proved to be useful for you -- as always, if there are ways that the tool can be more useful, be sure to suggest an idea. As for the diving -- some of our most competitive freedivers are actually great underwater hockey players! I have yet to give it a try, though!
2008-01-23Rahul Zutshi
 I have a query in decoding the AC coefficients.
Suppose we decode a MCU block of 8*8 and we have got more than 64 coefficients without getting an end of block.
How do we go about the situation in that case?
 If the decoder reads 64 coefficients without observing an EOB, it automatically assumes that the next coefficient is for the next MCU. If this is a wrong assumption, then the file must have been corrupted or the decoder is broken.
2008-01-22Malcolm
 I just wished to say that this tutorial is great! I'm writing an app that should work with JPEG and was searching in the Internet for something like this. After reading damn huge specification and trying to make it out this tutorial really makes things a lot clearer. Thanks!

I'll keep in touch for the new articles.
 Thanks! If you ever run into any difficulties, be sure to ask.
2008-01-21Rahul
 hi..
Is there a possibility that while encoding, we can come across some codes which are not available in the standard huffman tables. ??
If yes, what is the solution to the problem.
 If you meant encoding, then no. The DHT table should be written to contain all possible code values that appear in the scan segment. If you actually meant decoding; again, no, you should never encounter a huffman code that wasn't already defined in the DHT tables. This could mean that either a) your decoder has got out of sync or b) the file has become corrupted. I have done a lot of research in dealing with corrupted images, and it is extremely difficult for a decoder to get back into sync.
2008-01-16Antonio Barbra
 Ok,
I can follow your examples to this point up to the step of decoding the DC luminance on this page but once i get get to the AC i am not sure how to decode. Do i decode it the same way ass the DC meaning do i find the value for the number of additional bits the same way as i did with the DC and so forth?
 The same basic process is used to decode the AC coefficients, but there are a couple differences:
  • Different tables are used (with typically 162 huffman codes instead of 12)
  • Keep a count of the number of coefficients (VLC codes) that you have read for the current MCU. If you have read an EOB code or have reached 64 coefficients (DC coefficient is counted as 1), then you move on to the next MCU. Remember to include the ZRL (zero run length) values as coefficients too.
2008-01-13Rahul
 When we rotate a JPEG image ....
What are the changes done on the quantization table?
How do we optimize the hufman table ....??

(While rotating a JPEG image ... I used the same hufman table as in the original image...but then I found some values.. for which codes were not available.)
 When you rotate an image, you will need to rotate the Quantization Tables as well. This is done simply by rotating the 8x8 matrix in the same direction as the rotation.

To see an example, run JPEGsnoop on a image, save out the DQT entries, then (in Windows Explorer), rotate an image (losslessly using Windows Picture and Fax Viewer) and then rerun JPEGsnoop on the image.
2008-01-09Rahul
 Hi,
Continuing on my rotation query, I am now able to check all my MCU values from JPEG Snoop. I am getting same AC and DC coefficients.

But now my query is.... how do we construct the DCT matrix (which needs to be transposed and mirrored) for the rotation.

I am just constructing it from reordering it to the natural order.
Am I missing something here?

I checked in JPEG Snoop. How do we reach the values as shown by the DCT Matrix.?
 To go from the ordered coefficients to the DCT matrix as shown in JPEGsnoop, you simply perform 2 steps:
  • Reorder coefficients in zig-zag scan order
  • Multiply each coefficient in the matrix with the equivalent coefficient in the quantization table's matrix for the current component (ie. luminance or chrominance) as defined in the DQT marker.
If I have misunderstood your question, please feel free to ask it again.
2008-01-04Anon
 Just wanted to add another 'THANKS' for creating this page - exactly what I was looking for...
 
2008-01-01Rahul Zutshi
 Even I am assuming an image without any chroma sub-sampling.
I have separated all the DC coefficients while decoding and then using the same for encoding it while taking care of Diff. coding.
Is that OK..??
 That sounds correct, Rahul. To find out what the problem is, I suggest that you run JPEGsnoop with Detailed Scan Decode enabled for the MCU of interest. Then run this on your original, a version rotated by your code and on a version rotated by jpegtran. Then compare the output of JPEGsnoop for the DCT coefficients and you should see where the errors or differences lie.
2008-01-01Rahul Zutshi
 hey...
First of all...my new year wishes to you.

For rotating a JPEG Image :
I am performing the following steps for each mcu block
- decode it.
- convert it from zig zag to natural format
- transpose and mirror it.
- convert from natural to zigzag format.
- encode it.

Am a doing it correctly?
 First, let's assume that you do not have any chroma subsampling, which complicates things a little bit with respect to the ordering. With the transpose and mirroring (which involves changing the sign of the coefficients), the only other piece that you may be missing is the generation of the new DC coefficients. The DC coefficients (in a clockwise lossless rotation) are dependent upon running value from the MCU that is immediately below it, instead of to the left of it.
2007-12-22RyongKyongIl
 Hi, This is great.
But I have a question in the "Relative DC to Absolute DC" part.
I can't understand how you get "block 2:DC value=+518 " after you got "block1: DC value=-512, block2: DC value=1020"
Thank you for seeing my message.
 For every MCU the DC coefficient is encoded as the difference from the running total (absolute value) of the previous MCU/block. We start the file with an absolute value of 0. If the first DC coefficient is decoded as -512, then the absolute value of the first block is (0-512 = -512). If the next DC coefficient is decoded as +1020, then the absolute value of the first block is (last absolute value+difference = -512+1020 = +508). Finally, if we were to encounter a third block with DC coefficient decoded as +24, then the absolute value would become (+508+24 = +532). Hope that helps!
2007-12-20MC
 Hi Calvin,

I'm trying to decode JPEG images where the the images are corrupted (e.g huffman tables no longer available) so i'm trying to use standard Huffman table to decode them. However, I encountered this problem where there's a lot of bit patterns in the image which are not recognized by the standard Huffman table. Does this mean I cannot use the standard Huffman table for the decoding?
Thanks alot.
 Hi there Maria -- Unfortunately, no. Many digital cameras select their own huffman tables. Furthermore, some JPEG encoders (especially image editors) create a custom Huffman table that is optimized for the particular image. That said, depending on the type of digicam, you may be successful in copying over the DHT section from an image of the same digital camera model.
2007-12-10Rahul
 Hi Calvin,
Is the differential encoding that is done on the AC components done differently on different components i.e different encodings for Y,Cr,Cb components.???
 The AC encoding is done the same way for each component (assuming that subsampling has already been taken care of), but a different huffman table may be used. After quantization, the coefficients of each component are run-length coded (collapsing runs of zeros).
2007-12-10mark
 This is Great! but can you explain little more how AC-coefficients are decoded?
 Sure. I tried to keep the page a little simpler by eliminating AC coefficients from it, but I may create a separate walkthrough page that shows AC decoding as well. In the meantime, here are some extra hints: For the AC coefficients, there is no differential coding -- in other words there is no dependency on the previous block / MCU. One searches for the VLC code in the same manner as for the DC component, except that the code is made up of (ZRL,BitLen), where ZRL is the number of zero coefficients before the current value (which requires "BitLen" bits to code the actual value, like for the DC). The other significant difference is that a decoder needs to keep a running count of the number of coefficients that have been decoded for this component. When it reaches 64 (including the DC), or else a EOB code is encountered, the decoder must immediately go to the next component.

For example, let's say that we start with decoding the DC coefficient. We are now on coeff 1 (DC was coeff 0). If we determine that the next VLC code is 0x45, then we know that coeffs 1,2,3,4 are all zero (because ZRL=4) and then we have a huffman bitstring of length 5 bits that will be used to decode coefficient #5. If the next code is 0x36, then we know that coeffs 6,7,8 are also zero, and 9 is defined by the 6-bit bitstring. If we then encounter EOB, then we move on to the next component.

Hope that helps, Cal.
2007-12-06Rahul
 Thanks a lot Calvin.
This was definitely of great help.
 Glad to help!
2007-12-06Rahul
 If what I understand is correct ,
when we rotate a 4:2:0 Image
Y1 Y2 Y3 Y4 Cr Cb will reorder it self in Y component
as Y3 Y1 Y4 Y2 .

Am I correct...???
 That looks right. In a 4:2:0 subsampled image, the sequence for the luminance components (which you are labelling Y1 Y2 Y3 Y4) is left-to-right, top-to-bottom.

Y1Y2
Y3Y4

When you rotate the image clockwise 90 degrees (rotate right), it will become:

Y3Y1
Y4Y2

We still need to encode them in the order left-to-right, top-to-bottom, so this will correspond to the 8x8 blocks that you originally named Y3 Y1 Y4 Y2.

Hope that helps!
2007-12-06Rahul
 Hi Calvin,
I am working on JPEG Rotation. A couple of queries.
Will the Huffman talble that is used to encode the rotated image same which was used to decode the source image?

For a 4:2:0 Image , Will the Coded unit which will have Y1 Y2 Y3 Y4 Cb Cr be same in case of rotated image and the source image?

Thanks.
 When you perform a lossless rotation (e.g. 90 degrees) on an image, the huffman tables can be left the same, however there may be some small file size benefits to recreating the huffman tables (with huffman table optimization).

The quantization tables (JFIF DQT) are generally rotated, as are the subsampling ratios (e.g. if the JFIF SOF indicates a sampling factor of 2x1, then it will become 1x2). The number of luminance entries (Y) in the coded unit will be the same. The subsampling sequence order is still left-to-right, top-to-bottom after rotation.

It might be simpler to consider what happens with an image that only has subsampling in the horizontal direction (4:2:2 or 2x1): In the original image, the MCU order is Y1 Y2 Cr Cb. Y1 is the left 8x8 block and Y2 is the right 8x8 block. After rotation (subsampling ratio is now 1x2), the MCU order is still Y1 Y2 Cr Cb, but Y1 now references the top 8x8 block and Y2 references the bottom 8x8 block.
2007-12-06MC
 Hi,
Thanks for providing the information. This is a great help to my jpeg decoding project.

I have a question though, if I am to decode the JPEG scan data, should I store the data in byte array? Then how can I scan the data bit by bit? Thank a lot! :)
 There are many ways to do the bitstream decoding. I found it easiest to keep a single 32-bit buffer for the bytes coming from the file. I then look at the most-significant bits in the 32-bit word and search for a code that matches (I use a lookup table to speed this up). Finally, I shift the buffer word left however many bits I consumed during that VLC code decode. If there are more than 8 bits free (in the least significant bits in the buffer), then I append the next byte from the file stream. A 32-bit buffer works reasonably well because there are generally no codes that are longer than 16 bits. I found that using a lookup table based on codes up to length 9 bits helped performance considerably.

To help check that your decoder is working, be sure to use the Detailed Decode option in JPEGsnoop to confirm what you are decoding. I provided a little psuedo-code for the VLC code searching on my JPEG decoder design page. If you have any other questions, please feel free to ask.
2007-10-17Matt
 I have a case where the AC huffman tables contain codes of 0x10, 0x20, etc. My understanding is that these values are invalid -> value bit count of zero and a zero byte count of 1, 2, etc. Any idea of what is going on? I actually have found many examples - all of them use progressive encoding. The same Huffman table logic works on 1000's of others.
 I am less familiar with progressive DCT coding, but my understanding is that the codes with zero in the lower 4-bit positions indicates an EOBn (End of Band). This marks the end of the huffman bits within a particular progressive coding spectral band (e.g. DCT coefficients 1-5, etc.). This can be seen in the decode_mcu_AC_first function jdphuff.c file within the IJG source code. I found the JPEG standard a little cryptic in this particular section, but I haven't spent much time examining it.
 
2007-10-10Daniel
 You write:
> there are no variable-length codes that are all ones.

Is that really true?

Isn't the case really that there is no variable-length code that is the prefix of another variable length code?

(That rule is the normal rule for Huffman coding. Is there really any additional rule about all-one variable-length codes?)
 The ITU specification (ITU-T81) does in fact indicate a restriction on the all-1's code:

Code lengths greater than 16 bits are not allowed. In addition, the codes shall be generated such that the all-1-bits code word of any length is reserved as a prefix for longer code words.

If one could be certain of the maximum code length and didn't need to "future proof" the standard, then this restriction might not have been necessary.
2007-10-03Andrew
 Just wanted to say: fantastic page, thank you.
2007-10-03David
 Hi Calvin,

You mentioned the terms 4:2:2 and 4:2:0 and they represent 2x1 and 2x2 subsampling respectively. Can you explain how the ratios 4:2:2 and 4:2:0 come about? What does those numbers represent?

Just trying to understand some terminology. Thanks in advance.
 Good question. The 4:2:2 naming scheme that you see is actually the industry standard when referring to chroma subsampling, not the 2x1 that I write here. I deliberately chose to use the "X x Y" naming scheme to make it simpler to understand.

Nonetheless, the 3-digit ratios (A:B:C) are based upon the following:
  • 4:2:2 - (first digit) - Represents the luminances sampling reference. This is typically chosen to be 4, and is based on the original NTSC TV frequency, ~4 MHz).
  • 4:2:2 - (second digit) - Represents the horizontal chrominance (Cb and Cr) samples relative to the first digit. In other words, if the first digit is 4, and the second 2, then we have half as many chrominance entries in the horizontal direction and luminance.
  • 4:2:2 - (third digit) - This digit has a value that is the same as the second digit, except when it is zero. When it is zero, it means that the vertical chrominance has been subsampled 2:1.

So, a 4:2:2 means 2x1 subsampling (horizontal but no vertical). A 4:2:0 means 2x2 subsampling, etc.

2007-09-18Amir
 Hi
How do you encode a DC value of zero? it does not appear in the DC values table...

Thanks
 In the DC table you'll see an entry for "00" or "End of Block" this represents a zero-length value. Code "01" means a 1 bit data value follows, "02" means a 2 bit data value follows, etc. According to the table, you'll see the bit string "110", which means that the DC component was zero.
2007-07-07Shahbaz
 Dear Sir,

I am a student of Image Processing. I need a Jpeg compression algo using Huffman coding.
The input is an 8 x 8 image array. when the algo is applied on it, it returns the Halfman Coding value for each image part.

Plz help me. its very urgent. Thanks
 The 8x8 image array needs to be converted into the frequency domain (with DCT transform), which will make up a DC component and 63 AC components. You will then need to get or create a Huffman table (simplest is to reuse the standard ones defined in JPEG standard, reproduced here on this site). Finally, you can start the huffman coding process which involves searching for the variable bit-length codes (VLC) representing the DC and each of the 63 AC components.

If you work through my huffman coding example on this page in reverse, you should be able to figure out the steps. If you are really stuck, then have a look at the JPEG Decoder page, where I have provided links to source code.
2007-05-07chris
 calvin, huge thank you, everything is now perfectly clear :)
2007-05-04chris
 calvin, i'm afraid i do need more clarification. i've made your example '8*8 black, 8*8 white' jpeg jfif file but saved with 2*2 chroma subsampling. i've read it expecting the black block to give Y00, Y10, Y01, Y11, Cb and Cr, and then the same (though with different DC coeffs for Y of course) from the white block. what i actually found (all AC coeffs being 0) was: Y00 = -128, Y10 = +255, Y01 = 0, Y11 = 0, Cb = 0, Cr = 0, and then end of file. i think i don't understand subsampling - are the four Y blocks meant to be combined in some way to return a very precise luminosity for one block, or do they refer to adjacent blocks in some way?
 Chris, what you are observing is exactly what will happen in the 2x2 (4:2:0) subsampling case. I have now revised the text and added diagrams to the JPEG decoder page (section on chroma subsampling) that should hopefully make it a little more clear.

When encoding an image of H pixels by V pixels using chroma subsampling, you need to take the following steps:
  • Separate the image into 3 channels: Y, Cb and Cr, each with dimensions H x V pixels.
  • Perform the Chroma Subsampling step: convert the Cb image (sized H x V pixels) into a lower resolution, depending on the amount of subsampling requested. For 2x1 (4:2:2), the new Cb image will be H/2 x V pixels, while for 2x2 (4:2:0) sub-sampling, the new Cb image will be H/2 x V/2 pixels. Repeat for the Cr component. Reducing the resolution of the Cb and Cr components is usually done by averaging adjacent pixels.
  • Segment the 3 image planes (Y, Cb and Cr) after sub-sampling into 8x8 pixel regions. Note that for Cb and Cr components the color information may be derived from regions that are 16x8 or 16x16 in the original image.
  • Encode the components in each MCU, which will contain 1-4 Y components, a Cb and a Cr component. The MCU size will be 8x8, 16x8 or 16x16 pixels.
In your example, you have created 8x8 of black next to 8x8 of white. If you save with 2x1 chroma subsampling (known as 4:2:2), every pair of horizontal pixels are averaged into a single new subsampled pixel for the purposes of encoding the Cb and Cr components. The luminance (Y) are not subsampled.

Each MCU will therefore be taken from a region of 16x8 pixels of the original image. Since the original image is only 16x8 pixels total, this means that there would only be a single MCU in the image. This MCU will encode two Y components (Y0 and Y1) amd then the Cr and Cb from the lowered-resolution color channels.

In your case you have actually saved a 16x8 pixel image with 2x2 (4:2:0) chroma sub-sampling. This means that the averaging is done in both the horizontal and veritical direction. You don't actually have enough pixels to produce a full MCU because the chrominance channels are reduced in resolution by half in both directions. The original channel total image resolution is 16x8 pixels, but after 2x2 subsampling, it becomes 8x4 pixels. You need a full 8x8 pixels to make an MCU. So, the encoder has to zero out the missing lower pixels.

Getting back to the coded sequence, you should expect to have only one MCU, and it will contain the sequence: Y00 Y10, Y01, Y11, Cb, Cr. In your case, Y01 and Y11 don't exist, so they are left as zero by default. Y00 is from the black block and Y10 is from the white block. So, the results you're seeing is exactly as expected.

Hopefully that makes it a little more clearer.
2007-05-02chris
 thanks calvin, of course, that makes sense. another question: if i have chroma subsampling, how are Y1 and Y2 combined?
 I have described the basic sequence of coefficients used in the chroma subsampling scenario on my JPEG decoder page. If you want more clarification, let me know and I'll try to explain it in more detail.
2007-04-30chris
 calvin, this webpage is sooo helpful. quick (and possibly stupid) question: is there ever a danger of ambiguity? for example, if i have a string '1111...' then '11' might mean one thing, and '111' might mean another. does one always just take the shortest string that means something?
 Great question. This is precisely the reason why there are no variable-length codes that are all ones. With this rule in place, there is no longer any ambiguity. On the other hand, it still doesn't help the fact that it is extremely hard to resync up if there were ever any bit errors!
2007-03-21Alfredo Tschamler
 Hola, primero queria pedir disculpas por mandar el comentario en castellano, pero bueno, lo queria hacer mas rapido y no tengo tanta facilidad con el ingles.
mi comentario es principalmente para decirles que yo me estoy por recibir de ingeniero en computacion y estoy haciendo un proyecto final para mi carrera (tesis) en el que necesito decodificar imagenes jpeg, y me parecio muy buena su pagina web y me ayudo muchismo!!!, pero con todo respeto y con la mejor de mis intenciones quiero decirles que creo que hay un error, el cual es el siguiente:

uds dicen que el coeficiente DC 00101 se decodifica como -25, pero si no me equivoco es -26

tambien afirman que "10001 would be 9 and 11110 would be 14" y creo que 1001 seria 9 y 1110 seria 14.
si yo estoy equivocado les pido mil disculpas, este comentario lo hice con buenas intenciones y si yo tengo razon para que sea corregido.
Muchas gracias!!!
 First, my attempt at a translation to English:

Hello, first my apologies in write in Castellano, but I have a quick question and not much in the way of English. My comment is mainly to say that I am a computer engineer and currently working on a final project for my thesis in which I need to decode JPEG images. It's very good that there is this web page and it has helped me a lot! But with the best intentions and respect, I believe that there is an error: you say that the decoded coefficient DC 00101 is -25, but if I'm not mistaken, it is -26. Also you state that 10001 should be 9 and 11110 is 14. I believe that the series 1001 is 9 and the series 1110 is 14. If I am mistaken I request a thousand apologies. Thank you very much!
Yes, you are right! I had three errors in one section of my example but I have now fixed them. Thanks Alfredo for letting me know about these!

The corrections:
00101 = -24
10001 = +17
11110 = +30


For your interest, I am currently working on another web page (not finished yet) that will describe in more detail the process of designing and writing a JPEG decoder. The page will explain a number of the more difficult steps that I encountered while trying to write the JPEGsnoop JPEG decoder. I might also decide to add a feature to JPEGsnoop that would write out some of the decoded variable-length huffman codes from the scan data, which would help anyone else who is trying to build and test a decoder.

If you have any more questions, please feel free to ask!
Good luck!
2007-03-14Anon
 Hi,

The way I see it, there seems to be an inefficiency in the design. You get codes like this:
10
110
1110
11110

But the last code doesn't need to be that long! Look:
10
110
1110
1111

I hope you know what I mean. Using JPEGsnoop I see that a lot of images have this problem, is there any reason a code can't be all 1's? One of the longest codes should be, and it never is...
 This would seem to be an inefficiency, but it turns out to be one of the rules in the JPEG standard. No huffman codes can be all-ones. The reason for this is that occasionally an encoder needs to pad out bytes with extra ones (e.g. at end of file, before restart markers, etc.) and such padding could be misinterpreted as a valid code otherwise.
2007-01-30Jun
 Hi Calvin,

Do you happen to know how to create a DHT for the JFIF inside the Adobe PDF document? There are some case the default DHT been used. I only found the default values, and couldn't figure out what is the frequency of those values. If you have any information, please let me know. many, many thanks, Jun
 From what I can tell, the PDF file format should allow you to embed a standard JPEG JFIF file in its entirety, following the /Type /XObject /Subtype /Image and /Filter /DCTDecode commands.

To my knowledge, I don't believe that there are any restrictions on the DHT that exists within JPEG files embedded in a PDF. If I have misinterpreted your question, please feel free to correct me!
2007-01-27Robert
 Hi, Calvin,

Could you explain how to expand a huffman table from a DHT, like "Codes of length 03 bits (001 total): 03. How to know the expanded form of codes is 100=03 not other form like 110=03 or 101=03. Very thank you, Robert
 Great question. The expansion of the DHT (Huffman Tables) in a JPEG JFIF file into the actual bit strings is a little confusing. JPEG decoders often perform this table expansion (and the creation of lookup tables for performance) in different ways. The simplest way that I know to do it is to walk through the creation of the binary tree on paper as you read each code from the DHT table.

To make this clearer, I have now added a new section on this page to explain the basics of this bit string expansion process. Hopefully this should help answer your question! Be sure to let me know if you have any other questions, or if I have made it even more confusing
2007-01-26Jesse Camacho
 Do you happen to know how zero-run lengths greater than 16 are encoded? I think JPEG uses 0xF0 to encode a string of 15 (really 16) zeros, but what about 20 or 32, assuming that there is a non-zero value between the long run. Many thanks, Jesse
 It's very easy -- 0xF0 represents a single run of 16 zeros (15 zeros and a zero amplitude coefficient). If you need to encode a run length of 20 zeros, then you would observe the code 0xF0 followed by the code representing a run of 4 zeros. 32 would be represented by 0xF0 F0. Hope that makes sense!

 


Leave a comment or suggestion for this page:

(Never Shown - Optional, if you want a reply)
 

NOTE: I am out of the country for several months (in India), so comments will be held and only posted infrequently. Thanks!

11 users online