Return to Digital Photography Articles
Designing a JPEG Decoder & Source Code
This page is intended to provide some further details about the process of decoding JPEG images, based on my experience in writing the JPEGsnoop application. Source code examples are provided in C++.
Chroma Subsampling and the Huffman Scan data
Decoding the Scan Data segment (Marker: JFIF SOS, 0xFFDA) becomes a little more complicated when you need to deal with chroma subsampling. Since nearly all JPEG images are compressed with chroma subsampling, it is important that the decoder support the most common types of subsampling:
| Subsampling HxV | Description | Comments |
|---|---|---|
| 1x1 | No Subsampling | Very few digital cameras (Sigma SD is a notable exception) output images without subsampling the color channel. Instead, you'll generally find these files generated from image editing applications when a high quality setting was used. |
| 2x1 | Horizontal subsampled | The majority of reasonable digital cameras produce JPEG images with 2x1. |
| 1x2 | Vertical subsampled | Typically from 2x1 subsampled images that have been rotated by 90 degrees. |
| 2x2 | Horizontal & Vertical subsampled | Cheaper digital cameras and image editors saving photos at low quality settings. |
| others | Not very common at all for digital photos. Other sources such as miniDV camcorders will use 4x1 chroma subsampling. |
In each case, the encoding sequence is always to complete the luminance MCUs (1 or more) before going on to the chrominance components. Instead of observing a half or quarter as many color channel components, you'll find twice or four times as many luminance components encoded per block.
Coded Sequence for 1x1 (4:4:4)
The following 3 components make up one MCU, and are repeated in this sequence for every MCU. Note that the MCU represents a pixel area of 8x8 pixels.
[Y0(dc), Y0(ac)],
[Cb(dc), Cb(ac)],
[Cr(dc), Cr(ac)]
Coded Sequence for 2x1 (4:2:2)
The following 4 components make up one MCU. The MCU represents a pixel area of 16x8 pixels. Y0 and Y1 refer to two 8x8 pixel regions adjacent horizontally, while Cb and Cr refer to the single 16x8 pixel region that covers Y0 and Y1. To perform the encoding process, the Cb and Cr channels are first sub-sampled horizontally (this is usually done by averaging each pair of horizontal pixels into a single value). Then an MCU is made up of 8 averaged pixels horizontally by 8 original pixels veritically. The DC and AC coefficients are calculated on this new region.
[Y0(dc), Y0(ac)]
[Y1(dc), Y1(ac)],
[Cb(dc), Cb(ac)],
[Cr(dc), Cr(ac)]
Coded Sequence for 2x2 (4:2:0)
The following 6 components make up one MCU. The MCU represents a pixel area of 16x16 pixels. Y00 and Y10 refer to two 8x8 pixel regions adjacent horizontally, Y01 and Y11 refer to the two 8x8 pixel regions below Y00 and Y10. Cb and Cr refer to a single 16x16 pixel region that covers all four pixels (Y00, Y10, Y01 and Y11). To perform the encoding process, the Cb and Cr channels are first sub-sampled horizontally and vertically (this is usually done by averaging all four pixels into a single value). Then an MCU is made up of 8 averaged pixels horizontally by 8 averaged pixels veritically (which are based on the original region of 16x16 pixels). The DC and AC coefficients are calculated on this new region.
[Y00(dc), Y00(ac)]
[Y10(dc), Y10(ac)],
[Y01(dc), Y01(ac)],
[Y11(dc), Y11(ac)],
[Cb(dc), Cb(ac)],
[Cr(dc), Cr(ac)]
Variable-length Huffman Codes
Reading DHT Tables source code
Interpreting the DHT table in the JPEG JFIF file stream is not as intuitive as it may appear. It was designed to be stored in the most space-efficient manner, which makes reading it much harder. On paper, it is a fairly simple process to fill out the binary tree with the huffman codes as they appear in the DHT marker segment, but this doesn't translate well to the coded form.
The following example source code is completely un-optimized, as it is intended to show the basic mechanism of reading the DHT table and decoding the variable length JPEG huffman codes. It is based upon my experiences in developing the JPEGsnoop JPEG decoder from scratch.
Instead, one will observe that the codes that appear within the same bitlength will be incrementing by one. Going to the next bitlength requires the next code value to be doubled before continuing the count by one. This method is much more apparent when observing the resulting binary values that the codes correspond to.
Note that a two-dimensional array is used here to retain all of the variable-length codes. The first array index is used to reference the DHT table that we are processing (e.g. AC, DC, Luminance, Chrominance, etc.) The second index is simply to cover all of the possible code strings (max 162).
Locate the DHT Marker
repeat {
class & dest_id = <Read Byte>
table_set = dest_id*2 + class // Arbitrary mapping
table_ind = 0
Reset DhtCodeList[0..255] = 0xFFFF (dummy termination)
DhtCodesLen[1..16] = <Read Bytes (x16)>
for (ind_len=1; ind_len <= 16; ind_len++) {
for (ind_code=0; ind_code < DhtCodesLen[ind_len]; ind_code++) {
{
DhtCodeList[dht_ind++] = <Read Byte>
}
}
// Generate variable-length binary bit strings
code_val = 0;
dht_ind = 0;
for (bit_len=1; bit_len <= 16; bit_len++) {
for (code_ind=1; code_ind <= DhtCodesLen[bit_len]; code_ind++) {
tmp_mask = ( (1 << (bit_len))-1 ) << (32-bit_len);
tmp_bits = code_val << (32-bit_len);
tmp_code = DhtCodeList[dht_ind];
dht_lookup_bitlen [table_set][table_ind] = bit_len;
dht_lookup_bits [table_set][table_ind] = tmp_bits;
dht_lookup_mask [table_set][table_ind] = tmp_mask;
dht_lookup_code [table_set][table_ind] = tmp_code;
table_ind++;
code_val++;
dht_ind++;
}
code_val <<= 1;
}
dht_lookup_size[table_set] = table_ind;
} until all DHT tables processed
Once you have stored the variable-length codes into structures that make it easy to search later, we can easily locate the codes in the bitstream.
dht_lookup_mask[][0..# codes] = Left-most bit mask for bitstring compare
dht_lookup_bits[][0..# codes] = Left-most bits for bitstring compare
dht_lookup_code[][0..# codes] = The 8-bit code value corresponding to this bitstring
dht_lookup_bitlen[][0..# codes] = The number of bits in the variable-length code
Searching the DHT Huffman Codes source code
The most simple way to locate a variable length code is to search linearly through the list of codes using the bitmask, looking for a match:
scan_buff = <Next 8 unused bits from file, shifted left align to bit 31>
// Slow search for variable-length huffman code
while (!done) {
// Does the bit string match?
if ((scan_buff & dht_lookup_mask[set][ind]) == dht_lookup_bits[set][ind]) {
code = dht_lookup_code[set][ind];
bits_used += dht_lookup_bitlen[set][ind];
done = true;
}
ind++;
// Just in case no match was found, check for end of array
if (ind >= dht_lookup_size[set]) {
done = true;
}
}
After the above, bits_used indicates how long the variable-length code portion is, which can be used to shift the scan_buff for the next bit sequence.
JPEG Decoder Source Code
The following are some free resources that provide full source code implementations of other JPEG decoders. Several of these implementations take very different approaches, some simpler than others. Generally, the less refined / optimized versions are easier to understand.
- IJG 6b - The most widely-used JPEG Encoder / Decoder library, written in C. Fairly complex source code as it has been optimized heavily and is designed to work with a wide range of exception cases. Includes sample command-line applications (such as cjpeg, djpeg and jpegtran).
- Smaller Animals - Visual C++ code that demonstrates a simple application making use of the IJG 6a JPEG decoder libraries.
- Cristi Cuturicu - Very simple decoder source in C++. Some ASM optimization. Limited comments.
- Rich Geldreich - JPEG decoder. Includes MMX optimizations, progressive images. Limited comments.
- Cornell University - Lossless JPEG only
- Stanford University - Designed for research and experimentation (not optimized for production). Supports Lossless JPEG.
- JPEGsnoop (Calvin Hass) - JPEG decoder and Windows application. Fresh ground-up decoder library in Visual C++ with lots of comments :). Full source code not yet available.
Other Sections Coming soon:
JPEG JFIF Markers Parsing
| Hex | Marker | Marker Name | Description |
|---|---|---|---|
| 0x FFC0 | SOF0 | Start of Frame 0 | Baseline DCT |
| 0x FFC1 | SOF1 | Start of Frame 1 | Extended Sequential DCT |
| 0x FFC2 | SOF2 | Start of Frame 2 | Progressive DCT |
| 0x FFC3 | SOF3 | Start of Frame 3 | Lossless (sequential) |
| 0x FFC4 | DHT | Define Huffman Table | |
| 0x FFC5 | SOF5 | Start of Frame 5 | Differential sequential DCT |
| 0x FFC6 | SOF6 | Start of Frame 6 | Differential progressive DCT |
| 0x FFC7 | SOF7 | Start of Frame 7 | Differential lossless (sequential) |
| 0x FFC8 | JPG | JPEG Extensions | |
| 0x FFC9 | SOF9 | Start of Frame 9 | Extended sequential DCT, Arithmetic coding |
| 0x FFCA | SOF10 | Start of Frame 10 | Progressive DCT, Arithmetic coding |
| 0x FFCB | SOF11 | Start of Frame 11 | Lossless (sequential), Arithmetic coding |
| 0x FFCC | DAC | Define Arithmetic Coding | |
| 0x FFCD | SOF13 | Start of Frame 13 | Differential sequential DCT, Arithmetic coding |
| 0x FFCE | SOF14 | Start of Frame 14 | Differential progressive DCT, Arithmetic coding |
| 0x FFCF | SOF15 | Start of Frame 15 | Differential lossless (sequential), Arithmetic coding |
| 0x FFD0 | RST0 | Restart Marker 0 | |
| 0x FFD1 | RST1 | Restart Marker 1 | |
| 0x FFD2 | RST2 | Restart Marker 2 | |
| 0x FFD3 | RST3 | Restart Marker 3 | |
| 0x FFD4 | RST4 | Restart Marker 4 | |
| 0x FFD5 | RST5 | Restart Marker 5 | |
| 0x FFD6 | RST6 | Restart Marker 6 | |
| 0x FFD7 | RST7 | Restart Marker 7 | |
| 0x FFD8 | SOI | Start of Image | |
| 0x FFD9 | EOI | End of Image | |
| 0x FFDA | SOS | Start of Scan | |
| 0x FFDB | DQT | Define Quantization Table | |
| 0x FFDC | DNL | Define Number of Lines | (Not common) |
| 0x FFDD | DRI | Define Restart Interval | |
| 0x FFDE | DHP | Define Hierarchical Progression | (Not common) |
| 0x FFDF | EXP | Expand Reference Component | (Not common) |
| 0x FFE0 | APP0 | Application Segment 0 | JFIF - JFIF JPEG image AVI1 - Motion JPEG (MJPG) |
| 0x FFE1 | APP1 | Application Segment 1 | EXIF Metadata, TIFF IFD format, JPEG Thumbnail (160x120) Adobe XMP |
| 0x FFE2 | APP2 | Application Segment 2 | ICC color profile, FlashPix |
| 0x FFE3 | APP3 | Application Segment 3 | (Not common) JPS Tag for Stereoscopic JPEG images |
| 0x FFE4 | APP4 | Application Segment 4 | (Not common) |
| 0x FFE5 | APP5 | Application Segment 5 | (Not common) |
| 0x FFE6 | APP6 | Application Segment 6 | (Not common) NITF Lossles profile |
| 0x FFE7 | APP7 | Application Segment 7 | (Not common) |
| 0x FFE8 | APP8 | Application Segment 8 | (Not common) |
| 0x FFE9 | APP9 | Application Segment 9 | (Not common) |
| 0x FFEA | APP10 | Application Segment 10 PhoTags |
(Not common) ActiveObject (multimedia messages / captions) |
| 0x FFEB | APP11 | Application Segment 11 | (Not common) |
| 0x FFEC | APP12 | Application Segment 12 | Picture Info (older digicams), Photoshop Save for Web: Ducky |
| 0x FFED | APP13 | Application Segment 13 | Photoshop Save As: IRB, 8BIM, IPTC |
| 0x FFEE | APP14 | Application Segment 14 | (Not common) |
| 0x FFEF | APP15 | Application Segment 15 | (Not common) |
| 0x FFF0 ... 0x FFF6 |
JPG6 | JPEG Extension 0 ... JPEG Extension 6 |
(Not common) |
| 0x FFF7 | JPG7 SOF48 |
JPEG Extension 7 JPEG-LS |
Lossless JPEG |
| 0x FFF8 | JPG8 LSE |
JPEG Extension 8 JPEG-LS Extension |
Lossless JPEG Extension Parameters |
| 0x FFF9 | JPG9 | JPEG Extension 9 | (Not common) |
| 0x FFFA | JPG10 | JPEG Extension 10 | (Not common) |
| 0x FFFB | JPG11 | JPEG Extension 11 |
(Not common) |
| 0x FFFC | JPG12 | JPEG Extension 12 | (Not common) |
| 0x FFFD | JPG13 | JPEG Extension 13 | (Not common) |
| 0x FFFE | COM | Comment | |
| 0x FFFF | Stuff | Stuff Byte | Representation of 0xFF in data stream |
Counting AC Huffman Codes
Makernotes
YCC to RGB Color Conversion
Restart Markers
Variable Length Decode Optimization

Reader's Comments:
Please leave your comments or suggestions below!I have a situation where in i have a image which is in the RGB color space and the image was created using Irfan viewer. However when tried to view though some other editors the color appears to be different like if i have color as Red it will appear as green. I am trying to save an image of 1x1 pixel with red color in Irfan viewer and when tried to open in the windows gallery the color appears to be green. If some one has encountered this problem will it be possible for them share.
In our decoder first we check that whether the stream contains the standard Huffman table or not by comparing no of coeff for a particular length and the corresponding Run/Size value if all matches then we decode its a JPEG stnd table and we decode from the prestored coeffnt.Otherwise we create a huffman tres correnponding to a particular length where we store the code word and the Run/Size.
For this particular image,it is having standard huffman table but the code word is different for a particlur Run/Size and gives corrupted output.
My question is that do we need to compare the code word along with Run/Size to make a decision for a standard JPEG huffman table or not?
I am just starting to scan all my pictures from my old photo albums to archive and was wondering if you had any thoughts about which scanners do a decent enough job for such a task and/or which to avoid. I have a Dell All-in-One scananer/printer that seems to do okay. But I'd hate to find out hundreds of scans into this project that it's not a good choice in the long run. Any suggestions or can you suggest a site that covers this topic? Thanks!
That said, I had begun to do a similar task, scanning in all of my old photo albums. Originally I had looked at scanners with document feeders (eg. one from HP), but I didn't like the way that the photos were spun through the feeder (concerned about damaging / creasing the prints). I have since moved on to use my Canon flatbed scanner (8400) and use the multi-scan mode to capture & crop 3 or 4 4x6 prints at a time.
Good luck!
So now that I can programatically recreate the huffman tables and have a basic search thing going, I'm back to manually decoding the actual image data so that I can understand what my program will have to do.
Can I check a couple of points with you?
1) The image data must be stripped of its byte stuffings before/during decoding. I think wikipedia says that a byte stuffing is 0xFF00, but in your table you say it is 0xFFFF... could it be both or am I misunderstanding something?
2) If there are 4 Huffman tables and the ffc0 marker says there are 3 image components then Huffman Table 1 = lum DC, Huffman Table 2 = lum AC, Huffman Table 3 = Chrm Cb & Cr DC, Huffman Table 4 = Chrm Cb & Cr AC.
3) The order of the decoding process is:
a) Lum DC - (First code is length of useful bits. Followed by this length). In my example I get: 1110=0x06, so the next 6 bits are: 010101. 010=0x01 and 101=0x04. So the Huffman decoded part of lum dc is 0x06,0x01,0x04.
b) Lum AC - (First code is split into run, size.) I get 100=0x03 so a run of 0, then 3 bits of useful stuff. The next 3 bits are also 100. So final component is 0x03,0x03.
c) Chrm Cb DC - (First code is length of useful bits. Followed by this length)
This is where my decoding breaks down. I get a binary code of 110=0x03. The next 3 bits are 111 which is illegal. When I compare what I've done with JpegSnoop, I notice that JpegSnoop has all the above entries under a Lum table:
Lum (Tbl #0), MCU=[0,0]
[0x00000284.0]: ZRL=[ 0] Val=[ -42] Coef=[00= DC] Data=[0x E5 64 DF AA = 0b (11100101 01------ -------- --------)]
[0x00000285.2]: ZRL=[ 0] Val=[ 4] Coef=[01..01] Data=[0x 64 DF AA CA = 0b (--100100 -------- -------- --------)]
[0x00000286.0]: ZRL=[ 1] Val=[ 3] Coef=[02..03] Data=[0x DF AA CA 5F = 0b (1101111- -------- -------- --------)]
The first two entries seem to match my decoding, but why are they under a Lum table? The third entry gives 1101111. I've tried using all sorts of combinations of different Huffman tables but I can't work out how you could get this result...
I also noticed the ZRL=1 bit but in my Huffman Tables there are only two F0 symbols and the codes for them are completely different to what I have here.
Basically I'm completely lost here and clearly doing stuff wrong!
- The byte stuffs in the coded bitstream are indeed 0xFF00. I believe there may be allowances for a string of 0xFF's to appear, but don't recall running into this.
- This is the usual sequence, but you have to check the "class" codes to ensure that the channel vs AC/DC selection is in the order you expect.
- I think the place where you are going wrong is with your Lum AC. You need to keep decoding coefficients until you have either encountered an "EOB" or you have filled all 64 coefficients in the MCU. This must be done before moving on to the Chrominance components.
Hope that helps!Specifically:
tmp_mask = ( (1 << (bit_len))-1 ) << (32-bit_len);
tmp_bits = code_val << (32-bit_len);
tmp_code = DhtCodeList[dht_ind];
These lines seem to be the all important ones of getting the actual binary code for the symbol in question, but I just can't work out how they work!
I hope you don't mind me asking for a bit more help. I've moved on from hex editors (decoding by hand is just too painful) and on to the java now. Unfortunately I'm just too thick to get how you figure out the binary codes for each symbol within your nested for loops. I tried to work it out by myself as well but can only come up with very convoluted methods which are clearly highly unefficient.
I wonder if you could explain that step to me with an example or two?!
As I stated in a previous comment I am working on a web capture system. I got my mjpeg decoder done (thanks for the help on that) but I am getting an image with very distorted color information (when compared to JPEGsnoop's output). I traced my problem to the inverse DCT component used in my decoder. The values I get are either two high or two low. I used the equation that was in the JPEG standard, what equation did you use?
I appreciate your quick response, and I'm sure I'm missing something, but the tutorial you'd provided only provided an example with a DC encoded value, and leaving all AC values as 0. My problem is that I'm struggling to understand how to decode a DCT with non-zero AC values. The way I understand it, it's not as easy as just adding the AC to the DC and you have the specified pixel. I believe I have to calculate an arccosine on the values somehow, or something to that effect. Can you point me in the right direction, taking it step by step? I'd be very appreciative, thank you.
This product and your explanations have been tremendous. I'm just having one apparently hard time trying to figure out how to implement the IDCT. I've found the matrix transform formula, but I don't understand how to iterate through the matrices and come up with the original pixel luminance and chrominance values. Can you give me an example of how to go through one MCU and figure out the values before level shift?
This is from JPEGsnoop, in MCU[0,0]. This was with subsampling set to 4:2:0. My idea was to go into MS Paint and set the image width and height to 16x16, and set the top left pixel to a red color. Then I saved the image to JPEG. This is what I came up with.
Y0 table:
Cr table:
All other tables' values were all set to 0. (Y1, Y2, Y3 and Cb). So again, I'm looking to decode these values to the RGB space myself, and I'm having trouble understanding the IDCT formula. Could you walk me through this?
I did exactly this sort of walkthrough in the JPEG Huffman Coding Tutorial page. Hopefully that should point you in the right direction.
I have a JPEG Decoder that I've written based on varying sources for use in Silverlight and coded in C#.
All works well albeit slow - I only need to generate a small thumbnail from the source JPEG and so I have gone the path of decoding some of the MCUs and sampling the results - images are too ugly to use.
I noticed your App does either DC or DC and AC ... how do I do same for performance gain?
I have an IDCT but am uncertain of how to work with just the DC Table vs DC and AC.
thanx.
So, if you simply rely on DC components only, you will still need to do all of the Huffman VLC decode for all the DC and AC components, but you don't have to perform the inverse discrete cosine transform (IDCT). Simply take each of the DC components and convert them to the "average" color for the entire MCU. If you're just after creating a thumbnail, this is an easy way to display an approximate 1/8 view of the original image.
I am having a tough time extracting thumbnails from Thumbs.db. Extracted images from windows 2000 Thumbs.db is not stangard JPEG (Seems so). Can you point out how i can handle the situation? Shall i try to send you one or two sample pics <10K)
Regards,
IKM
Why DRI marker and when it is required...
thanks ,
giri
which part of the JPEG-encoding is the one which takes the much calculating time? I think either die DCT or the Huffman-Encoding. Do someone knows a source or a www-site which explains _in detail_ the calculting-time-usage of the different parts in JPEG-Encoding?
Im also (or especially!) interested in knowing this about jpeg 2000. Why takes the encoding so much time, is it the DWT or the arithmetic coding (in case of JPEG 2000 its the special EBCOT).
thank you very much,
Casiopaya
JPEGsnoop is very nice also. You may want to add these JPEG decoder implementations: Cornell and Stanford.
Please see: http://www.faqs.org/faqs/jpeg-faq/part2/section-15.html
The information about ijg not supporting lossless is not correct - you just have to find the right patches.
The link for the Stanford version is dead: try: http://hpux.cs.utah.edu/ftp/hpux/X11/Graphics/JPEG-1.2.1/JPEG-1.2.1-src-11.00.tar.gz
My question is if you have encountered this issue yourself or have any idea what the reason behind it is?
Could it be related to YUV to RGB conversion? Could this conversion really reduce the color quality, and in such case is there any specific conversion that should be used from YCbCr to RGB?
All the colors are correct and for high detail areas the color is correct but for open areas like skies the harsh edges between different shades are very visible.
As an aside, you say "the total amount of colors of the image just dropped radically after decoding" -- what is your reference point? Is your original image an uncompressed bitmap that you then compressed as a JPEG? What program was used and what settings?
Why not? I'm looking for it
You would first need to decompress the variable-length huffman codes (VLC) into their DC & AC coefficient values. From there, you could manipulate the YCbCr coefficients producing a desired color transform -- but realize that this will be in the frequency domain, not spatial domain. Including the IDCT step (frequency to spatial conversion) is lossy, so you'd want to avoid that if possible.
Once you modified the coefficients, you'd then look up the Huffman VLC codes that represent the new coefficient values. This step is guaranteed to produce some VLC codes that are longer or shorter than the original codes. Therefore, the resulting scan data segment will change in length and alignment significantly. This is effectively recompressing the entire image, but since you are only going from coefficients to huffman codes (ignoring quantization), this can still be lossless.
Trying to do editing of a JPEG file by manipulating only a few bytes "in-place" within the huffman VLC bitstream is extremely difficult, but it's possible (I've just recently accomplished this in my latest development versions of JPEGsnoop which perform JPEG image repair).
Writing a utility that does a "lossless" color conversion in the YCbCr frequency domain is a very interesting idea. I may try experimenting with this. One area that I'd have to look into further is how the color transform (i.e. multiplier) would be represented when working in the frequency domain (instead of the spatial domain) and if it introduces other issues.
As a simple test, I might see if I can change the brightness of an image by modifying the luminance component in this manner.
Great question, Ken!
For example, let's take an image that reports itself as 1800x1200 (in the JFIF SOF marker segment as Number of Samples per Line and Number of Lines, respectively). Furthermore, you decode the image as having 2x1 chroma subsampling (my term for chrominance resolution halved horizontally and full resolution vertically), from the SOF component Sampling Factor entries.
We then can conclude that the size of a single MCU is 16x8 pixels. The "padded" pixels in the horizontal direction is 1800 mod 16 = 8 and in the vertical direction: 1200 mod 8 = 0.
If I've misread your question, please let me know!
With progressive scanning, the image is coded in a series of passes. In each pass, only a subset of DCT coefficients in each MCU are selected (whereas for baseline, the entire 8x8 set of coefficients per MCU are coded in one pass). The selection of coefficients to code is defined by the mode of operation: spectral selection or successive approximation.
NOTE: I am on vacation, so comments will not be posted until I return. Thanks!