Wednesday, July 22, 2009

Grrr... [Minor update]

Problems aren't over. FFmpeg still won't install. I will be fighting with this for a little bit longer.

I do still believe that this program will be absolutely valuable in my project, but that's really just faith talking. I hope I'm not wasting my time.

Tuesday, July 21, 2009

6 hours later... [Hating Linux]

And it's almost 9, and I finally get a breakthrough. A breakthrough in this case, however, is finally getting to the point where I can install FFmpeg. It has now been several hours of head-splitting, manual-reading stress just figuring it out this far, and I haven't even installed it yet.

At least the guide that I'm reading has warned me that installation takes a long time, thus allowing me to go home now rather than get sucked in to another couple hours of work. If it goes smoothly, maybe everything will be installed by the time I wake up tomorrow, and I can start my exploring bright and early.

I need a drink.

I would make a horrible slam poet... [Breakthrough]

It's been a very long time since I've posted anything.

It's been a very long time since I've done any actual work.

Today, my first day back from this extended vacation, is not going particularly fast.

However, I think I have something now.

FFmpeg, an open source program for working with videos, comes with a collection of libraries that appear to be designed to do exactly what I want to do.

It's legitimate and free, two things that appear to be lacking from everything else found on the internet dealing with this sort of thing.

The rest of today, Wednesday, and Thursday will now be devoted to exploring/playing with this.

By next week, I want to be writing actual code.

That is all.

Wednesday, July 1, 2009

The big picture... [Project Description]

Only minimal progress has been made in the area of understanding video compression. So let's go off in another direction for a little bit, shall we? I'm going to take a step back and look at the big picture of what my project is supposed/going to be.

On the EWU campus, the computer science department has a few professors and students who are working on the MANOME project. MANOME, which stands for Metropolitan Area Network Optimized Music Environment, is an Eastern project to develop a system by which musicians separated by large distances can play music with each other over a network as if they are in the same room. Obviously this is a very big undertaking, and in most regards, they are nowhere close to being successful.

There are several steps that exist in the process of getting sound and video sent to another person. The light and sound waves that make up what we see and hear need to be converted from waves to discrete values (analog to digital). This is the job of the video camera and microphone. The digital signal sent from the video camera is very large, too large to send as raw data (there are a couple thousand pixels in one frame of a video, each requiring data to indicate the color and brightness of the pixel, and there are maybe 30 frames every second). For this reason, the video needs to be compressed before it is sent (this is where all the stuff I've been doing so far this summer comes into play). Compression involves calculating data into smaller data such that it doesn't take as much room (I'm starting to understand this more, but not enough to print it here).

The compressed data is then sent over the network. The network as we know it is slow because we use the internet through your common ISP such as Comcast. Every time you want to watch a video on the internet, such as through YouTube, you click play and wait a little bit of time for it to start. It might be less than a second if your connection is fast, but it still takes time (remember that MANOME needs its video to be instantaneous). If even YouTube is too slow, then we're in trouble, right? Well, it turns out that there is such a thing as the Internet2 network, a very high speed internet that is currently used for research and education. For example, according to the Internet2 website: in 2003, over a terabyte of data was transferred over 4300 miles from California to Switzerland in under 30 minutes, showing an average speed of 5.44 gigabits per second.

So, assuming we use Internet2, the compressed digital video and audio signal is now at the new destination. Here it needs to be decompressed. Difficulty in decompression depends on the difficulty in compression. For example, YouTube videos can be started so fast because the video has been compressed in such a way that decompression can be done very quickly. However, in order to do this, the act of compression is much slower. This is known as asymmetrical compression and decompression and is perfect for creating video files and streaming stored video, but completely wrong for live streaming video. In order to do it live, the compression and decompression methods need to be symmetric. I will go into some different compression methods in a different post.

Finally, the decompressed digital signal can be turned back into the light and audio waves through the monitor and speakers. Of course, this is not to mention that the entire process needs to simultaneously be done in reverse so that both people can see and hear the other person.

Most of what I have described here is already done using webcams and video chat - everything except it being fast enough. One easy way to make things faster is to degrade the video quality. Take away color and you have suddenly dropped the value needed to describe a pixel from 256 to 8 or 16. Another way is to reduce the frame rate so that fewer pictures need to be sent per second. Yet another way is to change the resolution of the picture so that fewer pixels need to be described for each picture.

However, doing any of these things can be problematic. Take away too much detail in the video and it will become unusable. It would be a bad thing for the MANOME researchers to work for a long time to build a system that includes video and find out that the video is no good for musicians. So my primary goal this summer is to find an optimum value, a threshold, for video quality. Find out how pixelated and choppy we can make a video and still let it be of good value to the musician.

So creating and implementing user tests with different qualities of videos is where I'm headed. Of course, none of this can be done until I figure out how to manipulate the quality of videos, either at creation time or by modifying a saved video file. It would be nice to talk to someone who knew more about this sort of thing, but I'm not really sure where to ask at the moment. For now, I appear to be stuck here reading until I find the answer.

Wednesday, June 24, 2009

So you're saying that the DCT is NOT a waste of time? [Learning something]

For a day devoted to understanding the discrete cosine transfer, I have only a limited amount of knowledge to show for it. But I'm going to go ahead and explain what I think I know, and what I don't know.

The DCT is basically an equation used to take a series of numbers and compress them into something that takes a smaller bitrate to store or send over the internet. Compression is valuable when it comes to sending movies, music, and audio, and almost all forms of files for these three things (such as mpg and mp3) utilize DCT.

When it comes to movies, which is what I am interested in, a frame (or picture) consists of values that can be thought of in terms of a matrix of values. It appears to be significantly more complicated than that, but for the purposes of today's blog, this is sufficient. There are so many of these values that it would take up way too much space on a computer to store a video file with raw data, and it would take way too long to send it over the internet.

DCT compresses it by representing this data as the coefficients of a number of cosine waves with different frequencies. For reasons that I can't explain completely, any set of data points can be represented by the addition of these various cosine waves. So, for example, one line of 8 data points can be used as input into this DCT equation, and the result is a new line of 8 data points.

One major point of confusion as I read this in my book is that the DCT equation is fairly complex requiring lots of multiplication and division in order to compute each new value in the new data set. And the resulting data doesn't seem useful at all. But ignoring my confusion over why it works, I just set out to prove to myself that it actually does work. I started something in Java, but quickly realized that it was going to be way easier to produce and visualize in Excel.

The Google Docs version of my result (minus the graphs) can be found here. It's not intuitive to read, but in case you do: The upper left corner shows the input and just below it shows the new values. Then just below that, the new values are transformed back to show that the process is reversible (or decodable). The large matrices on the right are values that are computed in order to compute the decoded values below. They are visible so that you can see what goes on as values are changed. They are meaningless except to look at the formulas that make them up.

The two finished pages in that document show the 1-D DCT and its inverse. I started making the 2-dimensional version and then realized that I would need 64 8x8 matrices (or one 4-d matrix) to keep all of the intermediate values. I don't need to do that anyway. I understand how it works now.

But now back to why it works. I sent an e-mail to Paul back at EWU and he explained that the transformed data is easier to truncate (plus you can do some other things with it that I don't understand). Truncating makes sense. When it comes to pictures and video, there is going to be data that the eye cannot see or that the brain cannot process quickly enough. This new data is essentially a representation of different frequency waves. Perhaps if you truncate the higher frequencies, you now have less data to send and have only changed the picture or movie in ways that are not noticeable.

From what I've read, this is the start to why DCT is useful, but it is far from the complete reason. I think the next chapter is going to help me understand as it is on the subject of what the brain can and cannot see. I still don't know how useful this knowledge is going to be to my final project, but at least it's something.

Working the DCT... [Starting programming]

I has indeed been difficult to work in my office, especially now that I have run out of valid excuses to delay working. Now there's no more pretension, I just don't want to do it. Basically the reason is because I'm not exactly sure what I'm doing, it's difficult to justify anything as worth my time.

That said, my reading of the MPEG book took a downward turn when it devoted an entire chapter to the discrete cosine transform. I have a BS in math, but I don't recall ever learning this. So I looked it up and tried to reconcile it with what the book is doing. Things don't perfectly match up, but I'm not entirely sure if I need to understand the tiny details of how it works.

But I really do want to understand, because understanding will help me be able to explain exactly why the discrete cosine transform is so valuable in video encoding and decoding. So I think today, I'll write a program that does DCT's. I'd rather just do some problems by hand, but that appears to be a little bit more complicated because the equations are slightly more complex than a high school textbook.

Besides, doing some coding will make me feel like I'm being productive. Even when it might turn out to be a complete waste of time.

Monday, June 22, 2009

And we're back... [New week, new toys]

Took a three-day weekend, but it wasn't due to slacking. In fact, I proposed to my girlfriend of 18+ months, Jackie, on Friday afternoon when she got home from work, and we got to spend the entire weekend celebrating (She said yes, btw). There will be a video of the proposal coming soon for those interested. For now here is a picture of us a few hours after the proposal and a night of drinking with friends.
But now it's time to start getting into the rhythm of work. However, that's going to be a little bit difficult because now I have my own room in the CSE building in which to work! Larry and Jim totally came through on their efforts to provide me with almost everything I could want for this summer. My room has some computers that aren't being used so I hooked up one of their monitors to my laptop. There's lots of storage space, and I'm the only one going to be in here. So as you can tell, there's no way I'll get work done now. There's just too many neat things to play with.

Now I have a new goal for my previous list: Bring people into my room to say hello. People seem to be abnormally private and antisocial around here, so this has been difficult thus far.

Thursday, June 18, 2009

A few things... [Quick Update]

  1. The bus pass attempt was a bust. It's ... just not going to happen this summer. UW has failed me.

  2. They do have the complete ISO standard for MPEG-1 free for me to use. It is in several very large binders and could take me all summer just to understand it all. I hope my use of this remains very limited.

  3. Larry's friend working on video compression is Richard. He was in his office, but was leaving as I found him. I need to send him an e-mail and try to get a hold of him next week.

Wednesday, June 17, 2009

Way to go Larry! [Exciting news]

I finally got a hold of Larry just after posting my last post. There is some exciting news there. Larry has put in a request for me to get more permanent access to the wireless internet inside the CSE building as well as, get this, a desk! That desk would entail access to the building, a semi-private space to keep things so I don't have to lug them around, and a break room with a fridge and a microwave. I'm pretty speechless at what he's done for me so far.

He also signed requests for me to get access to the gym and to get a discounted bus pass. The gym membership went through immediately, and tomorrow I will get to try things out. The bus pass couldn't happen because they don't take credit cards and they closed before I could get to an ATM and back. That'll be interesting to see if its approved because the form asks for my salary information as it expects me to be an official employee of UW rather than just a researcher. I guess we'll see.

Oooh, how interesting... [Good book. New Goals.]

The introduction chapter to MPEG video: Compression standard has been most enlightening. I now at least understand some of the basic concepts behind how video compression works. It also has pointed me to a couple of other resources: www.mpeg.org for references and www.ansi.org for the actual MPEG standard in all of its ugliness (I have yet to actually find the standard as the website is an absolute abomination of a user interface, but it appears that ISO/IEC 11172-1:1993 might be the first of three sections in the standard. It costs $157 to view it.)

With this in mind, lets break down my earlier goal of writing a program that encodes video into some smaller, more immediate goals.
  1. Read the rest of MPEG video: Compression standard. Ok, so maybe not all 500 pages of textbook style writing, but at least a few more chapters on the details of the standard.
  2. Check with the UW library to see if I have access to ANSI standards for free through them.
  3. Explore www.mpeg.org. I have a feeling that I'm going to find some good stuff there.
  4. Search for applications already in existence that do what my program wants to do. There is a good possibility that I won't actually need to program my own application before moving on to the next stage of my project.

Cozy and ready to work... [Progress Made]

The library issued me a card so that I can check out books and access online materials. They were surprisingly prepared for my circumstances. Maybe students visiting from other schools for research isn't such a unique situation as everyone else is making it out to be. The card expires on July 4th for some reason that no one can explain. Maybe my fall registration will be in the system by then and I can ask for an extension.

I now have internet everywhere I could want to go on campus. Getting wireless in the computer science building was the last step, and that was granted for 7 days until Larry returns. He has been unseen since I met with him on Monday, but this is basically the week off for everyone before summer officially starts next week, so who can blame him?

I found a book in the Engineering Library on the MPEG video standard which so far appears to be easy to read. Another book that I want to check out is currently missing from the shelves for an unknown reason. I have a hold on it, so someone else can do the searching for me.

Here is more information about the books for my own convenience:
Mitchell, J. L. (1996). MPEG video: Compression standard. Digital multimedia standards series. New York: Chapman & Hall.
Library listing. Amazon listing.

Symes, P. (2001). Video compression demystified. New York: McGraw-Hill.
Library listing. Amazon listing.

I just want email... [Internet logistics]

I came to campus for a few hours yesterday to check up on my status of various things here. Larry was not in and could therefore not introduce me to his contact researching video compression, and I have not yet been approved to purchase a summer membership at the fitness center here. However, I was introduced through a front desk receptionist to Larry's budget supervisor who at least signed me up for a temporary wireless account for the summer and also mentioned the possibility of giving me access to school and department resources. I'm maintaining my optimism there.

Today, circumstances have brought me here to campus a full 2 hours before the library I typically use is open. Thus I have instead found a comfy seat in an otherwise empty food court area of their student HUB in order to see if I can get the wireless to work. Even though the signal is very strong, something keeps happening and I constantly need to reset my adapter or get new IP settings after only a few pages. I'm not doing anything other than Google, Wikipedia, and UW's own website, so I'm not sure why there's such a problem.

I did get wireless working long enough to discover that there doesn't seem to be any way to change my password for my temporary account which is both odd and frustrating. I don't think I can memorize both my 9 digit id and 14 digit assigned password full of random letters, numbers, and punctuation. So now, my options now are to carry this password with me everywhere I go or log my computer in permanently as Neither option is particularly secure which is why this policy by UW is so bizarre.

UPDATE:
The wireless works perfectly inside the undergraduate library. So for now, the problem might just be localized in the HUB.

UPDATE 2:
An e-mail to tech support produced a link to a website where I can change my password.

Monday, June 15, 2009

That was fast... [Research Woes]

It hasn't taken me too long to get completely lost in this subject. Everywhere you look, there are 20 websites describing the simple features of a file format (MP4 files store audio and video streams, etc). But there isn't one website that explains, for instance, what the first byte of a MP4 file signifies, where in the file the data is stored, the limitations on the data that is stored, or anything else of any use to anyone who wishes to play with the file.

The following library search, "su:MPEG (Video coding standard)" looks to be a good start, but I can't access any of the materials that I want there until my status on UW is upgraded. Until then, I should try and do this search at the EWU library website because I might be able to access online materials through there.

My biggest problem right now is keeping all of the different file formats organized in my head. My experience with movie files has been nothing more than downloading them and playing them. What's the difference between a .avi file and a .mpg file to a user? Nothing, they both play movies on any media player I want. What's the difference between them behind the scenes? Everything. They don't even represent similar concepts for what they are suppossed to do. One is a compression standard while the other just seems to wrap around files already compressed.

And that's not even getting into the mind-boggling number of parts that exist for the standards. MP4 stands for MPEG-4 Part 14 which means that its the 4th version of the MPEG file standard and that the 14th section of the standard describes the file format. So what does this description say? Oh it's not so complicated; part 14 is just based on part 12. So what's part 12? That's simple; part 12 is just the basis for part 14. It's like the people writing this stuff don't get it themselves.

I might have to give up for the day and go talk with Larry's contact tomorrow. This is clearly one of those things that can only be resolved by pinning someone down and asking the right questions until they tell you what these resources are assuming that you already know.

Introduction

I'm here in Seattle as a visiting student from EWU in Spokane. I made the goal to graduate with my master's degree in computer science by December of this year, but I also wanted to spend the summer with my girlfriend, Jackie who lives in Seattle. As a result, I resolved to come to UW as if it were my regular job in order to work on a thesis/research project.

Today is my first day on the UW campus. Everything is in disarray, and might remain this way for a while. But there's a chance that everything might come together soon, and I can settle in to a normal work schedule. I took a campus tour, walked through all the libraries, and talked with many front desks for various departments that are all empty today because its the first monday of the summer.

Thanks to Paul, the chair of my department back at Eastern, I now have a contact in the computer science department here. Larry has graciously agreed to help me out with various things so that I don't feel like a random bum off the street using the campus resources. Right now we're looking into getting an official status for me with the university so that I might be able to take advantage of certain things like a discount bus pass, gym facilities, a refridgerator, or (knock on wood) a desk.

There will be more about my project later, but for now I will say that it has to do with video and audio compression. I have a lot of background work to do in this field because I currently know nothing. My first major milestone this summer is to write a program that allows me to manipulate video files in order to alter the visual quality (e.g. resolution, frame rate, and color).

I'm not sure where to start, but Wikipedia has provided me with some ideas. I was hoping to find some easy-to-read published standards on video files, like the standards that exist for networking protocols, but I'm not having much luck to start. Larry has a name of someone on campus for me to talk to as this person might be working in a similar field as me, but I wont be able to meet this person until tomorrow at the earliest. Until then, I'm going to keep browsing online for my answers.