A Compression Primer

In broad terms, there are only two ways to compress a still image. That is, there are only two kinds of redundancy that can be removed: statistical and perceptual.

Statistical:

Image data is rarely random. If you know what kinds of patterns to expect, you can invent a code that is very efficient for those codes. Examples include:

• Run-length coding. If a certain pattern repeats over and over, you can encode a single instance along with the number of repetitions.

• Dictionary. If a few patterns keep turning up here and there in the data, you can compile a list of them. Thereafter, you can replace the pattern with a number that represents its position in the list.

• Entropy. If some patterns are quite common while others are rarer, you can assign the shortest codes to the most common patterns, with progressively longer codes used for the rarer patterns.

All of these substitutions are perfectly reversible (lossless compression), but they rarely yield a high degree of compression.

Perceptual:

The human eye is the final judge of quality, and the eye sees the world differently than a scanner does. It pays more attention to some features than others. The trick, then, is to discard the data that represents features you suspect won't be missed.

In particular, the eye is sensitive to edges and fine details, and it is sensitive to smoothness of color surfaces, but not to both in the same place. JPEG uses a cosine transform to discover the relative proportions of smoothness and detail (low and high spatial frequencies, respectively) in each block of pixels.

About JPEG format

Standardized in the late 1980s, JPEG has become almost ubiquitous as a format for distributing images. It provides a lot of compression, is supported by most image creation and manipulation programs and is one of the two image formats that are sure to be supported in Web browsers.

The algorithm:

• First, the 24-bit RGB data is transformed into the YC[sub r]C[sub b] color space. The Y channel has the luminance information, while the two C channels encode chrominance as distance along the red-green and blue-yellow axes.

In the steps that follow, each channel of the YC[sub r]C[sub b] data is treated separately. There is no requirement that the channels be encoded with the same precision. In fact, it's common to handle the luminance channel with more precision than the chrominance. The reason is that the human eye is more sensitive to gray-scale details than to color details.

• Blocks. The data of each channel is divided into 8x8-pixel tiles.

• DCT. The Discrete Cosine Transform (DCT) is applied to each tile. The result is 64 coefficients (floating-point numbers) that represent how much energy is present at each "spatial frequency" in the tile. Smooth gradations correspond to low spatial frequencies, while sharp edges correspond to high frequencies.

• Quantization. Each coefficient is rounded off to some level of precision. Small coefficients can simply be zeroed out. Also, the eye has a hard time perceiving both smooth gradations and sharp details in the same area, so one or the other can usually be suppressed. This is the step that makes JPEG a lossy algorithm; the discarded precision can't be restored.

• Statistical coding. Finally, run-length and Huffman coding are applied to the stream of coefficients. This is where the data is actually shrunk. The purpose of all the previous steps has been to regularize the data so that these methods will be effective.

To decompress the file, you simply reverse all the above steps: unpack the Huffman and run-length coding, apply the Inverse Discrete Cosine Transform to each block, unite the three color channels of YC[sub r]C[sub b] data, and map the YC[sub r]C[sub b] color space back to RGB.

Limitations:

As is now well known, JPEG (like any lossy compression technique) creates artifacts and errors in the reconstructed image. Fortunately, the human eye is not sensitive to many of the errors that JPEG causes, and because the scheme allows a tradeoff of quality and compression ratio, a user can often choose a ratio that keeps the visible errors below the threshold of annoyance.

(For example, the JPEG standard allows each tile of an image to be compressed with different quantization steps and thus different quality tradeoffs. Photoshop 6 lets you use mask layers to define regions of the image that should be compressed differently, and thus minimize the visibility of artifacts while keeping overall file size low.)

However, certain kinds of image content are known to cause unavoidable problems. Contrasty edges have little "echoes" in nearby smooth regions, and the borders of adjacent tiles often aren't the same color. Worse, it is not always possible to predict which images will give trouble. This is bad enough in the graphic arts, where a compression artifact might spoil a print run. But in other industries, such as medical imaging, it can be devastating.


from Peter Dyson, A Look at JPEG2000.
Seybold Report on Internet Publishing, Jan2001, Vol. 5 Issue 5

Mathematical research by Lalu Simcik

HOME