How does JPEG work? A very rough answer

By: Andrew Spivak

We will have a look on how JPEG works in the simplest settings. Forget about color images, let's stick to grey-scale images. What is such an image? It is actually a table with numbers, representing brightness of the corresponding points, which vary from 0 to 255. What do we have to do? We have to develop two processes: the first (encoding) will transform an image (table with numbers) to a sequence of 0s and 1s, which is JPEG file. And aim is that such a sequence is not too long, i.e., it represents our picture in a compressed way. The second (decoding) will take JPEG file, the sequence of 0s and 1s, and will recover an image from it. Decoding is an inverse of encoding, so we concentrate on encoding process.

The core of the algorithm operates with small blocks of the image of size 8*8 pixels. There are three main steps: DCT, quantization, Huffman encoding. Sounds as abracadabra, lets look behind this.

DCT - discrete cosine transform. This is a tricky way to transform our image to another "image" of the same size (8*8), with new "colors" being called DCT-coefficients. Such transformation in a way simplifies the picture. In the following sense: if you have that all initial colors are close to some fixed color, the transformed "image", i.e., DCT-coefficients will have many close to zero numbers among themselves. It is easier to encode many zeros and few non-zeros (just trust in this).

Quantization - replacing DCT-coefficients by "rounded" numbers. For example, you have a DCT-coefficient which can be a number from -1023 to 1023. This gives you 2047 possibilities, and generally requires 12 bits (0s and 1s) to be stored in computer memory. OK, but let's say: we don't need to know that number exactly, some close number would be enough. Then one picks how rough we want to go, say we want to be not farther then 5. Then we can round up to 10. I.e., if we have 437, replace it by 440, if we have -171, replace it by -170. This way we have less possibilities - only numbers -1020, -1010, ..., -10, 0, 10, 20, ..., 1020. There are 205 cases then (much less than 2047), and 8 bits are enough to represent such a number. That's the idea. So, on quantization step, DCT-coefficients are rounded. This is where we loose some information, i.e., we would not be able to recover original image from quantized DCT-coefficients. But still, they are close enough to the real DCT-coefficients, so that the inverse DCT transform will give an image pretty close to our original one.

Huffman encoding - a way to store what is obtained after quantization. (There is in fact a small preliminary stage between quantization and Huffman encoding, called RLE, but we omit it now.) The goal is to code something which is frequent by bit combinations which are short. Example: you need to code a sequence of 100 letters, written using only a,b,c,d,e,f,g,h, 8 letters. Straightforward approach: 8 possibilities, 3 bits for each possibility, a=000, b=001, c=010, d=011, e=100, f=101, g=110, h=111, then you can always encode 100 letters using 100*3 = 300 bits. In practice, it may happen that 'a' appears much more frequently than other letters in your sequence. Then it is reasonable to spend less than 3 bits for such a letter. Of course, then you may need more bits for a rare letter, but overall you may win. This is the idea behind Huffman encoding - code frequent letters with shorter bit combinations. "Letters" in JPEG Huffman encoding are not letters, but properly arranged quantized DCT-coefficients.

That's all the magic. Without details. In summary, the ideas of JPEG compression algorithms are: special transform of the image (DCT), rounding the resulting numbers (quantization of coefficients) and smart coding of the result (Huffman) spending less info for more frequent "letters".

Computers
 • 
 • 
 • 
 • 
 • 
 • 
 • 
 • 
 • 
 • 
 • 
 • 
 • 
 • 
 • 
 • 
 • 
 • 
 • 
 • 

» More on Computers