Wednesday, July 1, 2009

The big picture... [Project Description]

Only minimal progress has been made in the area of understanding video compression. So let's go off in another direction for a little bit, shall we? I'm going to take a step back and look at the big picture of what my project is supposed/going to be.

On the EWU campus, the computer science department has a few professors and students who are working on the MANOME project. MANOME, which stands for Metropolitan Area Network Optimized Music Environment, is an Eastern project to develop a system by which musicians separated by large distances can play music with each other over a network as if they are in the same room. Obviously this is a very big undertaking, and in most regards, they are nowhere close to being successful.

There are several steps that exist in the process of getting sound and video sent to another person. The light and sound waves that make up what we see and hear need to be converted from waves to discrete values (analog to digital). This is the job of the video camera and microphone. The digital signal sent from the video camera is very large, too large to send as raw data (there are a couple thousand pixels in one frame of a video, each requiring data to indicate the color and brightness of the pixel, and there are maybe 30 frames every second). For this reason, the video needs to be compressed before it is sent (this is where all the stuff I've been doing so far this summer comes into play). Compression involves calculating data into smaller data such that it doesn't take as much room (I'm starting to understand this more, but not enough to print it here).

The compressed data is then sent over the network. The network as we know it is slow because we use the internet through your common ISP such as Comcast. Every time you want to watch a video on the internet, such as through YouTube, you click play and wait a little bit of time for it to start. It might be less than a second if your connection is fast, but it still takes time (remember that MANOME needs its video to be instantaneous). If even YouTube is too slow, then we're in trouble, right? Well, it turns out that there is such a thing as the Internet2 network, a very high speed internet that is currently used for research and education. For example, according to the Internet2 website: in 2003, over a terabyte of data was transferred over 4300 miles from California to Switzerland in under 30 minutes, showing an average speed of 5.44 gigabits per second.

So, assuming we use Internet2, the compressed digital video and audio signal is now at the new destination. Here it needs to be decompressed. Difficulty in decompression depends on the difficulty in compression. For example, YouTube videos can be started so fast because the video has been compressed in such a way that decompression can be done very quickly. However, in order to do this, the act of compression is much slower. This is known as asymmetrical compression and decompression and is perfect for creating video files and streaming stored video, but completely wrong for live streaming video. In order to do it live, the compression and decompression methods need to be symmetric. I will go into some different compression methods in a different post.

Finally, the decompressed digital signal can be turned back into the light and audio waves through the monitor and speakers. Of course, this is not to mention that the entire process needs to simultaneously be done in reverse so that both people can see and hear the other person.

Most of what I have described here is already done using webcams and video chat - everything except it being fast enough. One easy way to make things faster is to degrade the video quality. Take away color and you have suddenly dropped the value needed to describe a pixel from 256 to 8 or 16. Another way is to reduce the frame rate so that fewer pictures need to be sent per second. Yet another way is to change the resolution of the picture so that fewer pixels need to be described for each picture.

However, doing any of these things can be problematic. Take away too much detail in the video and it will become unusable. It would be a bad thing for the MANOME researchers to work for a long time to build a system that includes video and find out that the video is no good for musicians. So my primary goal this summer is to find an optimum value, a threshold, for video quality. Find out how pixelated and choppy we can make a video and still let it be of good value to the musician.

So creating and implementing user tests with different qualities of videos is where I'm headed. Of course, none of this can be done until I figure out how to manipulate the quality of videos, either at creation time or by modifying a saved video file. It would be nice to talk to someone who knew more about this sort of thing, but I'm not really sure where to ask at the moment. For now, I appear to be stuck here reading until I find the answer.

No comments:

Post a Comment