Wednesday, June 24, 2009

So you're saying that the DCT is NOT a waste of time? [Learning something]

For a day devoted to understanding the discrete cosine transfer, I have only a limited amount of knowledge to show for it. But I'm going to go ahead and explain what I think I know, and what I don't know.

The DCT is basically an equation used to take a series of numbers and compress them into something that takes a smaller bitrate to store or send over the internet. Compression is valuable when it comes to sending movies, music, and audio, and almost all forms of files for these three things (such as mpg and mp3) utilize DCT.

When it comes to movies, which is what I am interested in, a frame (or picture) consists of values that can be thought of in terms of a matrix of values. It appears to be significantly more complicated than that, but for the purposes of today's blog, this is sufficient. There are so many of these values that it would take up way too much space on a computer to store a video file with raw data, and it would take way too long to send it over the internet.

DCT compresses it by representing this data as the coefficients of a number of cosine waves with different frequencies. For reasons that I can't explain completely, any set of data points can be represented by the addition of these various cosine waves. So, for example, one line of 8 data points can be used as input into this DCT equation, and the result is a new line of 8 data points.

One major point of confusion as I read this in my book is that the DCT equation is fairly complex requiring lots of multiplication and division in order to compute each new value in the new data set. And the resulting data doesn't seem useful at all. But ignoring my confusion over why it works, I just set out to prove to myself that it actually does work. I started something in Java, but quickly realized that it was going to be way easier to produce and visualize in Excel.

The Google Docs version of my result (minus the graphs) can be found here. It's not intuitive to read, but in case you do: The upper left corner shows the input and just below it shows the new values. Then just below that, the new values are transformed back to show that the process is reversible (or decodable). The large matrices on the right are values that are computed in order to compute the decoded values below. They are visible so that you can see what goes on as values are changed. They are meaningless except to look at the formulas that make them up.

The two finished pages in that document show the 1-D DCT and its inverse. I started making the 2-dimensional version and then realized that I would need 64 8x8 matrices (or one 4-d matrix) to keep all of the intermediate values. I don't need to do that anyway. I understand how it works now.

But now back to why it works. I sent an e-mail to Paul back at EWU and he explained that the transformed data is easier to truncate (plus you can do some other things with it that I don't understand). Truncating makes sense. When it comes to pictures and video, there is going to be data that the eye cannot see or that the brain cannot process quickly enough. This new data is essentially a representation of different frequency waves. Perhaps if you truncate the higher frequencies, you now have less data to send and have only changed the picture or movie in ways that are not noticeable.

From what I've read, this is the start to why DCT is useful, but it is far from the complete reason. I think the next chapter is going to help me understand as it is on the subject of what the brain can and cannot see. I still don't know how useful this knowledge is going to be to my final project, but at least it's something.

No comments:

Post a Comment