Kyle's Research Project

Wednesday, July 22, 2009

Grrr... [Minor update]

Problems aren't over. FFmpeg still won't install. I will be fighting with this for a little bit longer.

I do still believe that this program will be absolutely valuable in my project, but that's really just faith talking. I hope I'm not wasting my time.

Tuesday, July 21, 2009

6 hours later... [Hating Linux]

And it's almost 9, and I finally get a breakthrough. A breakthrough in this case, however, is finally getting to the point where I can install FFmpeg. It has now been several hours of head-splitting, manual-reading stress just figuring it out this far, and I haven't even installed it yet.

At least the guide that I'm reading has warned me that installation takes a long time, thus allowing me to go home now rather than get sucked in to another couple hours of work. If it goes smoothly, maybe everything will be installed by the time I wake up tomorrow, and I can start my exploring bright and early.

I need a drink.

I would make a horrible slam poet... [Breakthrough]

It's been a very long time since I've posted anything.

It's been a very long time since I've done any actual work.

Today, my first day back from this extended vacation, is not going particularly fast.

However, I think I have something now.

FFmpeg, an open source program for working with videos, comes with a collection of libraries that appear to be designed to do exactly what I want to do.

It's legitimate and free, two things that appear to be lacking from everything else found on the internet dealing with this sort of thing.

The rest of today, Wednesday, and Thursday will now be devoted to exploring/playing with this.

By next week, I want to be writing actual code.

That is all.

Wednesday, July 1, 2009

The big picture... [Project Description]

Only minimal progress has been made in the area of understanding video compression. So let's go off in another direction for a little bit, shall we? I'm going to take a step back and look at the big picture of what my project is supposed/going to be.

On the EWU campus, the computer science department has a few professors and students who are working on the MANOME project. MANOME, which stands for Metropolitan Area Network Optimized Music Environment, is an Eastern project to develop a system by which musicians separated by large distances can play music with each other over a network as if they are in the same room. Obviously this is a very big undertaking, and in most regards, they are nowhere close to being successful.

There are several steps that exist in the process of getting sound and video sent to another person. The light and sound waves that make up what we see and hear need to be converted from waves to discrete values (analog to digital). This is the job of the video camera and microphone. The digital signal sent from the video camera is very large, too large to send as raw data (there are a couple thousand pixels in one frame of a video, each requiring data to indicate the color and brightness of the pixel, and there are maybe 30 frames every second). For this reason, the video needs to be compressed before it is sent (this is where all the stuff I've been doing so far this summer comes into play). Compression involves calculating data into smaller data such that it doesn't take as much room (I'm starting to understand this more, but not enough to print it here).

The compressed data is then sent over the network. The network as we know it is slow because we use the internet through your common ISP such as Comcast. Every time you want to watch a video on the internet, such as through YouTube, you click play and wait a little bit of time for it to start. It might be less than a second if your connection is fast, but it still takes time (remember that MANOME needs its video to be instantaneous). If even YouTube is too slow, then we're in trouble, right? Well, it turns out that there is such a thing as the Internet2 network, a very high speed internet that is currently used for research and education. For example, according to the Internet2 website: in 2003, over a terabyte of data was transferred over 4300 miles from California to Switzerland in under 30 minutes, showing an average speed of 5.44 gigabits per second.

So, assuming we use Internet2, the compressed digital video and audio signal is now at the new destination. Here it needs to be decompressed. Difficulty in decompression depends on the difficulty in compression. For example, YouTube videos can be started so fast because the video has been compressed in such a way that decompression can be done very quickly. However, in order to do this, the act of compression is much slower. This is known as asymmetrical compression and decompression and is perfect for creating video files and streaming stored video, but completely wrong for live streaming video. In order to do it live, the compression and decompression methods need to be symmetric. I will go into some different compression methods in a different post.

Finally, the decompressed digital signal can be turned back into the light and audio waves through the monitor and speakers. Of course, this is not to mention that the entire process needs to simultaneously be done in reverse so that both people can see and hear the other person.

Most of what I have described here is already done using webcams and video chat - everything except it being fast enough. One easy way to make things faster is to degrade the video quality. Take away color and you have suddenly dropped the value needed to describe a pixel from 256 to 8 or 16. Another way is to reduce the frame rate so that fewer pictures need to be sent per second. Yet another way is to change the resolution of the picture so that fewer pixels need to be described for each picture.

However, doing any of these things can be problematic. Take away too much detail in the video and it will become unusable. It would be a bad thing for the MANOME researchers to work for a long time to build a system that includes video and find out that the video is no good for musicians. So my primary goal this summer is to find an optimum value, a threshold, for video quality. Find out how pixelated and choppy we can make a video and still let it be of good value to the musician.

So creating and implementing user tests with different qualities of videos is where I'm headed. Of course, none of this can be done until I figure out how to manipulate the quality of videos, either at creation time or by modifying a saved video file. It would be nice to talk to someone who knew more about this sort of thing, but I'm not really sure where to ask at the moment. For now, I appear to be stuck here reading until I find the answer.

Wednesday, June 24, 2009

So you're saying that the DCT is NOT a waste of time? [Learning something]

For a day devoted to understanding the discrete cosine transfer, I have only a limited amount of knowledge to show for it. But I'm going to go ahead and explain what I think I know, and what I don't know.

The DCT is basically an equation used to take a series of numbers and compress them into something that takes a smaller bitrate to store or send over the internet. Compression is valuable when it comes to sending movies, music, and audio, and almost all forms of files for these three things (such as mpg and mp3) utilize DCT.

When it comes to movies, which is what I am interested in, a frame (or picture) consists of values that can be thought of in terms of a matrix of values. It appears to be significantly more complicated than that, but for the purposes of today's blog, this is sufficient. There are so many of these values that it would take up way too much space on a computer to store a video file with raw data, and it would take way too long to send it over the internet.

DCT compresses it by representing this data as the coefficients of a number of cosine waves with different frequencies. For reasons that I can't explain completely, any set of data points can be represented by the addition of these various cosine waves. So, for example, one line of 8 data points can be used as input into this DCT equation, and the result is a new line of 8 data points.

One major point of confusion as I read this in my book is that the DCT equation is fairly complex requiring lots of multiplication and division in order to compute each new value in the new data set. And the resulting data doesn't seem useful at all. But ignoring my confusion over why it works, I just set out to prove to myself that it actually does work. I started something in Java, but quickly realized that it was going to be way easier to produce and visualize in Excel.

The Google Docs version of my result (minus the graphs) can be found here. It's not intuitive to read, but in case you do: The upper left corner shows the input and just below it shows the new values. Then just below that, the new values are transformed back to show that the process is reversible (or decodable). The large matrices on the right are values that are computed in order to compute the decoded values below. They are visible so that you can see what goes on as values are changed. They are meaningless except to look at the formulas that make them up.

The two finished pages in that document show the 1-D DCT and its inverse. I started making the 2-dimensional version and then realized that I would need 64 8x8 matrices (or one 4-d matrix) to keep all of the intermediate values. I don't need to do that anyway. I understand how it works now.

But now back to why it works. I sent an e-mail to Paul back at EWU and he explained that the transformed data is easier to truncate (plus you can do some other things with it that I don't understand). Truncating makes sense. When it comes to pictures and video, there is going to be data that the eye cannot see or that the brain cannot process quickly enough. This new data is essentially a representation of different frequency waves. Perhaps if you truncate the higher frequencies, you now have less data to send and have only changed the picture or movie in ways that are not noticeable.

From what I've read, this is the start to why DCT is useful, but it is far from the complete reason. I think the next chapter is going to help me understand as it is on the subject of what the brain can and cannot see. I still don't know how useful this knowledge is going to be to my final project, but at least it's something.

Working the DCT... [Starting programming]

I has indeed been difficult to work in my office, especially now that I have run out of valid excuses to delay working. Now there's no more pretension, I just don't want to do it. Basically the reason is because I'm not exactly sure what I'm doing, it's difficult to justify anything as worth my time.

That said, my reading of the MPEG book took a downward turn when it devoted an entire chapter to the discrete cosine transform. I have a BS in math, but I don't recall ever learning this. So I looked it up and tried to reconcile it with what the book is doing. Things don't perfectly match up, but I'm not entirely sure if I need to understand the tiny details of how it works.

But I really do want to understand, because understanding will help me be able to explain exactly why the discrete cosine transform is so valuable in video encoding and decoding. So I think today, I'll write a program that does DCT's. I'd rather just do some problems by hand, but that appears to be a little bit more complicated because the equations are slightly more complex than a high school textbook.

Besides, doing some coding will make me feel like I'm being productive. Even when it might turn out to be a complete waste of time.

Monday, June 22, 2009

And we're back... [New week, new toys]

Took a three-day weekend, but it wasn't due to slacking. In fact, I proposed to my girlfriend of 18+ months, Jackie, on Friday afternoon when she got home from work, and we got to spend the entire weekend celebrating (She said yes, btw). There will be a video of the proposal coming soon for those interested. For now here is a picture of us a few hours after the proposal and a night of drinking with friends.

But now it's time to start getting into the rhythm of work. However, that's going to be a little bit difficult because now I have my own room in the CSE building in which to work! Larry and Jim totally came through on their efforts to provide me with almost everything I could want for this summer. My room has some computers that aren't being used so I hooked up one of their monitors to my laptop. There's lots of storage space, and I'm the only one going to be in here. So as you can tell, there's no way I'll get work done now. There's just too many neat things to play with.

Now I have a new goal for my previous list: Bring people into my room to say hello. People seem to be abnormally private and antisocial around here, so this has been difficult thus far.