3.3: Image Compression
Digital images, such as the photographs taken on your phone, are displayed as a rectangular array of pixels. For example, the photograph in Figure 3.3.1 is 1440 pixels wide and 1468 pixels high. If we were to zoom in on the photograph, we would be able to see individual pixels, such as those shown on the right.
A lot of data is required to display this image. A quantity of digital data is frequently measured in bytes, where one byte is the amount of storage needed to record an integer between 0 and 255. As we will see shortly, each pixel requires three bytes to record that pixel's color. This means the amount of data required to display this image is \(3\times1440\times1468 = 6,341,760\) bytes or about 6.3 megabytes.
Of course, we would like to store this image on a phone or computer and perhaps transmit it through our data plan to share it with others. If possible, we would like to find a way to represent this image using a smaller amount of data so that we don't run out of memory on our phone and quickly exhaust our data plan.
As we will see in this section, the JPEG compression algorithm provides a means for doing just that. This image, when stored in the JPEG format, requires only 467,359 bytes of data, which is about 7% of the 6.3 megabytes required to display the image. That is, when we display this image, we are reconstructing it from only 7% of the original data. This isn't too suprising since there is quite a bit of redundancy in the image; the left half of the image is almost uniformly blue. The JPEG algorithm detects this redundancy by representing the data using appropriate bases.
Preview Activity 3.3.1.
Since we will be using various bases and the coordinate systems they define, let's review how we translate between coordinate systems.
- Suppose that we have a basis \(\bcal=\{\mathbf v_1,\mathbf v_2,\ldots,\mathbf v_m\}\) for \(\mathbb R^m\text{.}\) Explain what we mean by the representation \(\coords{\mathbf x}{\bcal}\) of a vector \(\mathbf x\) in the coordinate system defined by \(\bcal\text{.}\)
- If we are given the representation \(\coords{\mathbf x}{\bcal}\text{,}\) how can we recover the vector \(\mathbf x\text{?}\)
- If we are given the vector \(\mathbf x\text{,}\) how can we find \(\coords{\mathbf x}{\bcal}\text{?}\)
-
Suppose that
\begin{equation*} \bcal=\left\{\twovec{1}{3},\twovec{1}{1}\right\} \end{equation*}
is a basis for \(\mathbb R^2\text{.}\) If \(\coords{\mathbf x}{\bcal} = \twovec{1}{-2}\text{,}\) find the vector \(\mathbf x\text{.}\)
- If \(\mathbf x=\twovec{2}{-4}\text{,}\) find \(\coords{\mathbf x}{\bcal}\text{.}\)
Color models
A color is represented digitally by a vector in \(\mathbb R^3\text{.}\) There are different ways in which we can represent colors, however, depending on whether a computer or a human will be processing the color. We will describe two of these representations, called color models , and demonstrate how they are used in the JPEG compression algorithm.
Digital displays typically create colors by blending together various amounts of red, green, and blue. We can therefore describe a color by putting its constituent amounts of red, green, and blue into a vector \(\threevec{R}{G}{B}\text{.}\) The quantities \(R\text{,}\) \(G\text{,}\) and \(B\) are stored with one byte of information so they are allowed to vary between 0 and 255. This is called the \(RGB\) color model.
We define a basis \(\bcal=\{\mathbf v_1,\mathbf v_2,\mathbf v_3\}\) where
to define a new coordinate system with coordinates we denote \(Y\text{,}\) \(C_b\text{,}\) and \(C_r\text{:}\)
The coordinate \(Y\) is called luminance while \(C_b\) and \(C_r\) are called blue and red chrominance , respectively. In this coordinate system, luminance will vary from 0 to 255, while the chrominances vary between -127.5 and 127.5. This is known as the \(YC_bC_r\) color model. (To be completely accurate, we should add 127.5 to the chrominance values so that they lie between 0 and 255, but we won't worry about that here.)
Activity 3.3.2.
In this activity, we will explore the difference between these two coordinate systems.
-
First, we will explore the \(RGB\) color model.
The diagram below enables you to create colors using various amounts of red, green, and blue. For each of these three quantities, the slider varies between 0 and 255.
- What happens when \(G=0\text{,}\) \(B=0\) (pushed all the way to the left), and \(R\) is allowed to vary?
- What happens when \(R=0\text{,}\) \(G=0\text{,}\) and \(B\) is allowed to vary?
- How can you create black in this color model?
- How can you create white?
-
Next, we will explore the \(YC_bC_r\) color model.
The diagram below enables you to create colors using various amounts of luminance \(Y\), blue chrominance \(C_b\), and red chrominance \(C_r\). The lumninance slider moves between 0 and 255 while the chrominance sliders move between -127.5 and 127.5.
- What happens when \(C_b=0\) and \(C_r=0\) (kept in the center) and \(Y\) is allowed to vary?
- What happens when \(Y=0\) (pushed to the left), \(C_r=0\) (kept in the center), and \(C_b\) is allowed to increase between 0 and 127.5?
- What happens when \(Y=0\text{,}\) \(C_b=0\text{,}\) and \(C_r\) is allowed to increase between 0 and 127.5?
- How can you create black in this color model?
- How can you create white?
-
Verify that \(\bcal\) is a basis for \(\mathbb R^3\text{.}\)
- Find the matrix \(C_{\bcal}\) that converts from \(\threevec{Y}{C_b}{C_r}\) coordinates into \(\threevec{R}{G}{B}\) coordinates. Then find the matrix \(C_{\bcal}^{-1}\) that converts from \(\threevec{R}{G}{B}\) coordinates back into \(\threevec{Y}{C_b}{C_r}\) coordinates.
-
Find the \(\threevec{Y}{C_b}{C_r}\) coordinates for the following colors and check, using the diagrams above, that the two representations agree.
- Pure red is \(\threevec{R}{G}{B}= \threevec{255}{0}{0}\text{.}\)
- Pure green is \(\threevec{R}{G}{B}= \threevec{0}{255}{0}\text{.}\)
- Pure blue is \(\threevec{R}{G}{B}= \threevec{0}{0}{255}\text{.}\)
- Pure white is \(\threevec{R}{G}{B}= \threevec{255}{255}{255}\text{.}\)
- Pure black is \(\threevec{R}{G}{B}= \threevec{0}{0}{0}\text{.}\)
-
Find the \(\threevec{R}{G}{B}\) coordinates for the following colors and check, using the diagrams above, that the two representations agree.
- \(\threevec{Y}{C_b}{C_r}=\threevec{128}{0}{0}\text{.}\)
- \(\threevec{Y}{C_b}{C_r}=\threevec{128}{60}{0}\text{.}\)
- \(\threevec{Y}{C_b}{C_r}=\threevec{128}{0}{60}\text{.}\)
-
Write an expression for
- The luminance \(Y\) as it depends on \(R\text{,}\) \(G\text{,}\) and \(B\text{.}\)
- The blue chrominance \(C_b\) as it depends on \(R\text{,}\) \(G\text{,}\) and \(B\text{.}\)
- The red chrominance \(C_r\) as it depends on \(R\text{,}\) \(G\text{,}\) and \(B\text{.}\)
Explain how these quantities can be roughly interpreted by stating that
- the luminance represents the brightness of the color.
- the blue chrominance measures the amount of blue in the color.
- the red chrominance measures the amount of red in the color.
These two color models provide us with two ways to represent colors, each of which is useful in a certain context. Digital displays, such as those in phones and computer monitors, create colors by combining differing amounts of red, green, and blue. The \(RGB\) model is therefore most relevant in digital applications.
By contrast, the \(YC_bC_r\) color model was created based on research into human vision and aims to concentrate the most visually important data into a single coordinate, the luminance, to which our eyes are most sensitive. Of course, any basis of \(\mathbb R^3\) must have three vectors so we need two more coordinates, blue and red chrominance, if we want to represent all colors.
To see this explicitly, shown in Figure 3.3.2 is the original image and the image as rendered with only the luminance. That is, on the right, the color of each pixel is represented by only byte, which is the luminance. This image essentially looks like a grayscale version of the original image with all its visual detail. In fact, before digital television became the standard, television signals were broadcast using the \(YC_bC_r\) color model. When a signal was displayed on a black-and-white television, the luminance was displayed and the two chrominance values simply ignored.
For comparison, shown in Figure 3.3.3 are the corresponding images created using only the blue chrominance and the red chrominance. Notice that the amount of visual detail is considerably less in these images.
In the JPEG compression algorithm, we are interested in representing an image using the smallest amount of data possible. By converting from the \(RGB\) color model to the \(YC_bC_r\) color model, we are concentrating the most visually important data into a single quantity. This is helpful because we can safely ignore some of the data in the chrominance values since that data is not as visually important.
The JPEG compression algorithm
The key to representing the image using a smaller amount of data is to detect redundancies in the data. For this reason, we will break the image, which is composed of \(1440\times 1468\) pixels, into small \(8\times8\) blocks of pixels. For example, we will consider the \(8\times8\) block of pixels outlined in green in the original image, shown on the left of Figure 3.3.4. The image on the right zooms in on the block.
Notice that this block, as seen in the original image, is very small. If we were to change some of the colors in this block slightly, our eyes would probably not notice.
Following our earlier work, we will change the representation of colors from the \(RGB\) color model to the \(YC_bC_r\) model. This separates the colors into luminance and chrominance values that we will consider separately. In Figure 3.3.6, we see the luminance values of this block. Again, notice how these values do not vary significantly over the block.
Our strategy in the compression algorithm is to perform a change of basis to take advantage of the fact that the luminance values do not change significantly over the block. Rather than recording the luminance of each of the pixels, this change of basis will allow us to record the average luminance along with some information about how the individual colors vary from the average.
Let's look at the first column of luminance values, which is a vector in \(\mathbb R^8\text{:}\)
We will perform a change of basis so that we can describe this vector by the average of the luminance values and information about variations from the average.
The JPEG compression algorithm uses the Discrete Fourier Transform , which is defined using the basis \(\ccal\) whose basis vectors are
On first glance, this probably looks intimidating, but we can make sense of it by looking at these vectors graphically. Shown in Figure 3.3.7 are four of these basis vectors. Notice that \(\mathbf v_0\) is constantly 1, \(\mathbf v_1\) is relatively slowly varying, \(\mathbf v_2\) varies a little more rapidly, and \(\mathbf v_7\) varies quite rapidly. This is the main observation: the basis vectors vary at different rates with the first vectors varying relatively slowly.
These vectors form the basis \(\ccal\) for \(\mathbb R^8\text{.}\) Remember that \(\mathbf x\) is the vector of luminance values in the first column as seen on the right. We will write \(\mathbf x\) in the new coordinates
The coordinates \(F_j\) are called the Fourier coefficients of the vector \(\mathbf x\text{.}\)
Activity 3.3.3.
We will explore the influence that the Fourier coefficients have on the vector \(\mathbf x\text{.}\)
The diagram below enables you to vary three of the Fourier coefficients \(F_0\), \(F_3\), and \(F_7\) and observe the effect on \({\mathbf x}\).- Describe the effect on the vector \(\mathbf x\) when you vary \(F_0\text{.}\)
- Now observe the effect on \(\mathbf x\) when \(F_3\) and \(F_7\) are varied. Compare the effect of \(F_0\text{,}\) \(F_3\text{,}\) and \(F_7\text{.}\)
- If the vector \(\mathbf x\) shows only small variations, what would you expect to be true of the Fourier coefficients \(F_j\text{?}\)
-
The Sage cell below will construct the vector \(C_{\bcal}\text{,}\) which is denoted
C
, and its inverse \(C_{\bcal}^{-1}\text{,}\) which is denotedCinv
. Evaluate this Sage cell and notice that it prints the matrix \(C_{\bcal}^{-1}\text{.}\)Now look at the form of \(C_{\bcal}^{-1}\) and explain why \(F_0\) is the average of the luminance values in the vector \(\mathbf x\text{.}\) -
The Sage cell below defines the vector \(\mathbf x\text{,}\) which is the vector of luminance values in the first column, as seen in Figure 3.3.6. Use the cell below to find the vector \(\mathbf f\) of Fourier coefficients \(F_0,F_1,\ldots,F_7\text{.}\) If you have evaluated the cell above, you will still be able to refer to
C
andCinv
in this cell.Write the Fourier coefficients and discuss the relative sizes of the coefficients. -
We see that the coefficients \(F_6\) and \(F_7\text{,}\) which correspond to rapid variations in the luminance values, are quite small. Let's see what happens when we ignore them. Form a new vector of Fourier coefficients by rounding the coefficients to the nearest integer and setting \(F_6\) and \(F_7\) to zero. This is an approximation to \(\mathbf f\text{,}\) the vector of Fourier coefficients. Use the approximation to \(\mathbf f\) to form an approximation of the vector \(\mathbf x\text{.}\)
How much does your approximation differ from the actual vector \(\mathbf x\text{?}\)
-
When we ignore the Fourier coefficients corresponding to rapidly varying basis elements, we see that the vector \(\mathbf x\) that we reconstruct is very close to the original one. In fact, the luminance values in the approximation differ by at most one or two from the actual luminance values. Our eyes are not sensitive enough to detect this difference.
So far, we have concentrated on only one column in our \(8\times8\) block of luminance values. Let's now consider all of the columns. The following Sage cell defines a matrix called
luminance
, which is the \(8\times8\) matrix of luminance values. Find the \(8\times8\) matrix \(F\) whose columns are the Fourier coefficients of the columns of luminance values. -
Notice that the first row of this matrix consists of the Fourier coefficient \(F_0\) for each of the columns. Just as we saw before, the entries in this row do not change significantly as we move across the row. In the Sage cell below, write these entries in the vector \(\yvec\) and find the corresponding Fourier coefficients.
Up to this point, we have been working with the luminance values in one \(8\times8\) block of our image. We formed the Fourier coefficients for each of the columns of this block. Once we notice that the Fourier coefficients across a row are relatively constant, it seems reasonable to find the Fourier coefficients of the rows of the matrix of Fourier coefficients. Doing so leads to the matrix
If we were to look inside a JPEG image file, we would see lots of matrices like this. For each \(8\times 8\) block, there would be three matrices of Fourier coefficients of the rows of Fourier coefficients, one matrix for each of the luminance, blue chrominance, and red chrominance values. However, we store these Fourier coefficients as integers inside the JPEG file so we need to round off coefficients to the nearest integer, as shown here:
There are many zeroes in this matrix, and this observation is where we can save space when creating a JPEG image file: we will only record the nonzero Fourier coefficients.
In fact, when a JPEG file is created, there is a “quality” parameter that can be set, such as that shown in Figure 3.3.8. When the quality parameter is high, we will store many of the Fourier coefficients; when it is low, we will ignore more of them.
To see how this works, suppose the quality setting is relatively high. After rounding off the Fourier coefficients, we will set all of the coefficients whose absolute value is less than 2 to zero, which creates the matrix:
Notice that there are 12 Fourier coefficients, out of 64, that we need to record. Consequently, we only record \(12/64 \approx 19\%\) of the data.
If instead, the quality setting is relatively low, we set all of the Fourier coefficients whose absolute value is less than 4 to zero, creating the matrix:
Notice that there are only 5 nonzero Fourier coefficients that we need to record now, meaning we record only \(5/64\approx 8\%\) of the data. This will result in a smaller JPEG file describing the image.
With a lower quality setting, we have thrown away more information about the Fourier coefficients so the image will not be reconstructed as accurately. To see this, we can reconstruct the luminance values from the Fourier coefficients by converting back into the standard coordinate system. Rather than showing the luminance values themselves, we will show the difference in the original luminance values and the reconstructed luminance values. When the quality setting was high and we stored 12 Fourier coefficients, we find this difference to be
When the quality setting is lower and we store only 5 Fourier coefficients, the difference is
This demonstrates the trade off. With a high quality setting, we require more storage to save more of the data, but the reconstructed image is closer to the original. With the lower quality setting, we require less storage, but the reconstructed image differs more from the original.
If we remember that the visual information stored by the blue and red chrominance values is not as important as that contained in the luminance values, we feel safer in discarding more of the Fourier coefficients for the chrominance values resulting in a greater savings.
Shown in Figure 3.3.9 is the original image compared to a version stored with a very low quality setting. If you look carefully, you can individual \(8\times8\) blocks.
This description of the JPEG compression algorithm is meant to convey the ideas that underlie its construction. There are a few details, most notably about the rounding of the Fourier coefficents, that are not strictly accurate. The actual implementation is a little more complicated, but the presentation here conveys the spirit of the algorithm.
We have described the JPEG compression algorithm, which allows us to store image files using only a fraction of the data. Similar ideas are used to efficiently store digital music and video files.
Summary
This section has explored how appropriate changes in bases help us reconstruct an image using only a fraction of its data. This is known as image compression.
- There are several ways of representing colors, all of which use vectors in \(\mathbb R^3\text{.}\) We explored the \(RGB\) color model, which is appropriate in digital applications, and the \(YC_bC_r\) model, in which the most important visual information is conveyed by the \(Y\) coordinate, known as lumninance.
- We also explored a change of basis called the Discrete Fourier Transform. In the coordinate system that results, the first coordinate measures the average of the components of a vector. Subsequent components measure deviations from the average.
- We put both of these ideas to use in demonstrating the JPEG compression algorithm. An image is broken into \(8\times8\) blocks, and the colors into luminance, blue chrominance, and red chrominance. Applying the Discrete Fourier Transform allowed us to reconstruct a good approximation of the image using only a small number of Fourier coefficients.
Exercises 3.3.4Exercises
Consider the vector \(\mathbf x=\left[\begin{array}{r} 103 \\ 94 \\ 91 \\ 92 \\ 103 \\ 105 \\ 105 \\ 108 \end{array}\right] \text{.}\)
-
In the Sage cell below is a copy of the change of basis matrices that define the Fourier transform. Find the Fourier coefficients of \(\mathbf x\text{.}\)
- We will now form the vector \(\yvec\text{,}\) which is an approximation of \(\mathbf x\text{.}\) To do this, round all the Fourier coefficients of \(\mathbf x\) to the nearest integer to obtain \(\coords{\yvec}{\ccal}\text{.}\) If a coefficient has an absolute value less than one, set it equal to zero. Now find the vector \(\yvec\) and compare this approximation to \(\mathbf x\text{.}\) What is the error in this approximation?
- Repeat the last part of this problem, but set the rounded Fourier coefficients to zero if they have an absolute value less than five. Use it to create a second approximation of \(\mathbf x\text{.}\) What is the error in this approximation?
- Compare the number of nonzero Fourier coefficients that you have in the two approximations and compare the accuracy of the approximations. Using a few sentences, discuss the comparisons that you find.
There are several steps to the JPEG compression algorithm. The following questions examine the motivation behind some of them.
- What is the overall goal of the JPEG compression algorithm?
- Why do we convert colors from the the \(RGB\) color model to the \(YC_bC_r\) model?
- Why do we decompose the image into a collection of \(8\times8\) arrays of pixels?
- What role does the Discrete Fourier Transform play in the JPEG compression algorithm?
- Why is the information conveyed by the rapid-variation Fourier coefficients, generally speaking, less important than the slow-variation coefficients?
The Fourier transform that we used in this section is often called the Discrete Fourier Cosine Transform because it is defined using a basis \(\ccal\) consisting of cosine functions. There is also a Fourier Sine Transform defined using a basis \(\scal\) consisting of sine functions. For instance, in \(\mathbb R^4\text{,}\) the basis vectors of \(\scal\) are
We can think of these vectors graphically, as shown in Figure 3.3.10.
-
The Sage cell below defines the matrix
S
whose columns are the vectors in the basis \(\scal\) as well as the matrixC
whose columns form the basis \(\ccal\) used in the Fourier Cosine Transform.In the \(8\times8\) block of luminance values we considered in this section, the first column begins with the four entries 176, 181, 165, and 139, as seen in Figure 3.3.6. These form the vector \(\mathbf x=\fourvec{176}{181}{165}{139}\text{.}\) Find both \(\coords{\mathbf x}{\scal}\) and \(\coords{\mathbf x}{\ccal}\text{.}\)
- Write a sentence or two comparing the values for the Fourier Sine coefficients \(\coords{\mathbf x}{\scal}\) and the Fourier Cosine coefficients \(\coords{\mathbf x}{\ccal}\text{.}\)
- Suppose now that \(\mathbf x=\fourvec{100}{100}{100}{100}\text{.}\) Find the Fourier Sine coefficients \(\coords{\mathbf x}{\scal}\) and the Fourie Cosine coefficients \(\coords{\mathbf x}{\ccal}\text{.}\)
- Write a few sentences explaining why we use the Fourier Cosine Transform in the JPEG compression algorithm rather than the Fourier Sine Transform.
In Example 3.2.7, we looked at a basis for \(\mathbb R^4\) that we called the Haar wavelet basis. The basis vectors are
which may be understood graphically as in Figure 3.3.11. We will denote this basis by \(\wcal\text{.}\)
The change of coordinates from a vector \(\mathbf x\) in \(\mathbb R^4\) to \(\coords{\mathbf x}{\wcal}\) is called the Haar wavelet transform and we write
The coefficients \(H_1,H_2,H_3,H_4\) are called wavelet coefficients.
Let's work with the \(4\times4\) block of luminance values in the upper left corner of our larger \(8\times8\) block:
-
The following Sage cell defines the matrix
W
whose columns are the basis vectors in \(\wcal\text{.}\) If \(\mathbf x\) is the first column of luminance values in the \(4\times4\) block above, find the wavelet coefficients \(\coords{\mathbf x}{\wcal}\text{.}\) -
Notice that \(H_1\) gives the average value of the components of \(\mathbf x\) and \(H_2\) describes how the averages of the first two and last two components differ from the overall average. The coefficients \(H_3\) and \(H_4\) describe small-scale variations between the first two components and last two components, respectively.
If we set the last wavelet coefficients \(H_3=0\) and \(H_4=0\text{,}\) we obtain the wavelet coefficients \(\coords{\yvec}{\wcal}\) for a vector \(\yvec\) that approximates \(\mathbf x\text{.}\) Find the vector \(\yvec\) and compare it to the original vector \(\mathbf x\text{.}\)
- What impact does the fact that \(H_3=0\) and \(H_4=0\) have on the form of the vector \(\yvec\text{?}\) Explain how setting these coefficients to zero ignores the behavior of \(\mathbf x\) on a small scale.
-
In the JPEG compression algorithm, we looked at the Fourier coefficients of all the columns of luminance values and then performed a Fourier transform on the rows. The Sage cell below will perform the same operation using the wavelet transform; that is, it will first find the wavelet coefficients of each of the columns and then perform the wavelet transform on the rows. You only need to evaluate the cell to find the wavelet coefficients obtained in this way.
-
Now set all the wavelet coefficients equal to zero except those in the upper left \(2\times2\) block and use them to define the matrix
coeffs
in the Sage cell below. This has the effect of ignoring all of the small-scale differences. Evaluating this cell will recover the approximate luminance values. - Explain how the wavelet transform and this approximation can be used to create a lower resolution version of the image.
This kind of wavelet transform is the basis of the JPEG 2000 compression algorithm, which is an alternative to the usual JPEG algorithm.
In this section, we looked at the \(RGB\) and \(YC_bC_r\) color models. In this exercise, we will look at the \(HSV\) color model where \(H\) is the hue, \(S\) is the saturation, and \(V\) is the value of the color. All three quantities vary between 0 and 255.
The diagram below enables you to vary the three parameters \(H\), \(S\), and \(V\) in the \(HSV\) color model.- If you leave \(S\) and \(V\) at some fixed values, what happens when you change the value of \(H\text{?}\)
- Move the value \(V\) to the right and keep it fixed. Describe what happens when you vary the saturation \(S\) using a fixed hue \(H\) and value \(V\text{.}\)
- Describe what happens when \(H\) and \(S\) are fixed and \(V\) varies.
- How can you create white in this color model?
- How can you create black in this color model?
- Find an approximate range of hues that correspond to blue.
- Find an approximate range of hues that correspond to green.
The \(YC_bC_r\) color model concentrates the most important visual information in the luminance coordinate, which roughly measures the brightness of the color. The other two coordinates describe the hue of the color. By contrast, the \(HSV\) color model concentrates all the information about the hue in the \(H\) coordinate.
This is useful in computer vision applications. For instance, if we want a robot to detect a blue ball in its field of vision, we can specify a range of hue values to search for. If the lighting changes in the room, the saturation and value may change, but the hue will not. This increases the likelihood that the robot will still detect the blue ball across a wide range of lighting conditions.