Click Here to Install Silverlight*
United StatesChange|All Microsoft Sites
Windows Media Player 9 Series
|Windows Media Worldwide

Encoding for HD with Windows Media

Ben Waggoner Ben Waggoner
Microsoft Corporation

April 2006






Contents:


Introduction

About Ben
Ben Waggoner offers industry-leading digital video consulting, training, and encoding services. Ben was formerly Director of Consulting Services for Media 100 and Terran Interactive, and before that Chief Technologist and founder of Journeyman Digital. He is a contributing editor for DV Magazine, and frequently writes about video compression for it. Visit Ben's Web site to learn more.
 
Discuss this Article
Discuss this article with Ben and the Windows Media community on the WMTalk e-mail list. Sign up.
So far, this series of articles has focused on how to get source content to an encoder. The first two articles, Zero to HD in 60 Seconds and Understanding HD Formats, describe important concepts about high definition (HD) formats and storage needs. The last article, Desktop HD Capture Solutions, described common HD video capture issues. This article addresses compression and how to encode the file itself.

High definition encoding comes with several constraints. Not only are we constrained by the sheer size of our files, but also by the power of the device that plays the media. In addition, HD encoding, especially at a 1920x1080 frame size (also referred to as 1080), is very challenging to play back on older computers. So, it becomes a balancing act between providing a good visual experience and providing reliable playback.

Back to the top of this pageBack to the top


Encoding Modes

There are several encoding modes available in Windows Media Encoder 9 Series, and picking the right one for a particular project can be a challenge. Fortunately, there are only two appropriate for HD encoding.

Constant Bit Rate
Constant bit rate (CBR) encoding is required for real-time streaming. It produces a file that has an average bit rate that does not exceed the buffer duration. So, in a 1000 kilobits per second (Kbps) file with a five-second buffer, any arbitrary five seconds from the file will have a bit rate of 1000 Kbps or less. Not only is CBR required for streaming, it is also the best choice when the decode speed, rather than the file size, is the limiting factor.

CBR supports both one-pass and two-pass encoding. For Web data rates, if at all possible, use two-pass encoding. This lets the codec have a complete measurement of all the frames in the file, enabling it to provide optimal distribution of bits.

For the purposes of HD, however, use one-pass CBR. A one-pass CBR encode estimates how the video will change in the future, and sometimes that estimate is wrong. With the high data rates of HD, the estimate that one-pass CBR gives is actually better than two-pass CBR with today's Windows Media Video 9 codec. However, future Microsoft codecs will support other modes as well.

While CBR is a necessity for real-time streaming, real-time streaming is in its early days for HD. Most HD content is still being distributed and played as downloaded files. These files play back fine using CBR, but CBR is less compression-efficient than other methods for file-based playback, because the data rate for the easy portions of the video—sections with very little movement or change in the picture—ends up being a higher data rate than necessary. However, since CBR doesn't have peaks of higher data rates, like variable bit rate (VBR) does, it is still more efficient for playback. For example, a computer can play back a CBR file with a higher average data rate more efficiently than it can playback a VBR file at the same data rate. The next article in this series will cover real-time encoding.

Peak Limited VBR
There are three different variable bit rate (VBR) modes in the Windows Media Encoder, but theoretically, only peak bit rate-based VBR is appropriate for HD encoding. However, the current implementation has some limitations for HD.

The basic VBR mode is the "unconstrained" mode called bit rate-based VBR. With this mode, only an average bit rate is specified. The number of bits of each frame will be proportional to how difficult it is to encode. If the complexity is very constant, the bit rate will also be very constant. If the content is highly variable, the bit rate will be more variable. This is a drawback to unconstrained VBR because it can produce very high peak bit rates that can overwhelm the decoding process, leading to dropped frames.

For HD content, where playback performance is the paramount concern, the optimum file-based encoding scheme is CBR. However, in cases where both playback performance and file size are both of concern, Constrained VBR is an option. This is like unconstrained VBR, but with the buffer limitation options of CBR. Thus, an average bit rate is specified as well as a peak bit rate, defined as a maximum bit rate and duration. For example, a file can be encoded with an average bit rate of 5000 Kbps, with a peak bit rate of 5 seconds of 9000 Kbps. That file would play wherever a CBR file at 9000 Kbps and a 5 second buffer would play, but the file would be as large as a CBR file at 5000 Kbps.

The other VBR mode is quality-based VBR. Unlike the other modes, a quality value is specified instead of a bit rate. The encoder gives each frame as many or as few bits as it needs to meet the quality target. Because quality-based VBR does not have any peak constraints, it is not appropriate for HD delivery because it generally isn't capable of real-time playback.

Back to the top of this pageBack to the top


Data Rates

There are two data rates that go into any peak limited VBR encode: average and peak. Average is the average of the whole file, and so determines the final file size. Peak is the highest allowed data rate over the buffer duration. These are two different values, and there isn't any real connection between them. If you are encoding a movie for a device that supports a 9000 megabit per second (Mbps) maximum peak, the average bit rate could be 2000 Mbps, 7000 Mbps, or anything else, depending on how much content needs to fit on the media. The average bit rate determines the average quality, and the highest average bit rate that still allows the content to fit should be used. If there is enough space that the same value can be used for average and peak, just use CBR instead.

Peak data rate will not vary from project to project. All files that are encoded for a particular specification or device will use the exact same peak values. Peak determines the quality in the hardest parts of the content, such as very fast, complex motion. Raising the peak data rate won't change the appearance of most of the content, just the hardest scenes. With some content, the codec might not even use the full peak bit rate range.

A great new tool for analyzing encoded WMV files is in development. It is called WMSnoop from Sliq Media Technologies, and it enables you to see how the bit rate gets distributed, as you can see in the following illustration.

Screenshot of WMSnoop

Back to the top of this pageBack to the top


Frame Sizes

Frame size is, of course, what makes HD high definition. HD can be defined as anything bigger than the biggest standard definition, which is the 720x576 frame size that PAL uses. The standard HD sizes for production are 1920x1080 (also referred to as 1080) and 1280x720 (also referred to as 720). However, one great thing about Windows Media is that it supports arbitrary frame sizes. The only requirement is that the height and width must be even numbers.

Even though the 16:9 frame size of HD means that less letterboxing is required when showing film content, a lot of HD content will have letterboxing (black bars at the top and bottom of the frame). But since WMV files can be any shape, there is no need to include the letterboxing while encoding. Instead, the black bars should be cropped out and the frame size reduced correspondingly. So, when you have a 1280x720 file containing a letterboxed 2.35:1 movie, you can leave out the letterboxing when you encode to a WMV file, and the results will be a 1280x544 movie. This provides all the pixels from the source, but will be substantially easier to play back.

Encoding content that is 720 is normally done using square pixels. Encoding content that is 1080, using Windows Media, is normally anamorphic, with the width being squeezed down from 1920 to 1440, while the height remains the same. By setting the pixel aspect ratio value, the file will be automatically stretched to the correct aspect ratio on playback. In the 2.35:1 example above, a 1920x1080 movie would be cropped to 1920x816, and then scaled horizontally to 1440x816. Then when it is played back, it will be scaled to match the frame size.

The following illustration shows a 2.35:1 movie at 1920x1080.

Example of a 1920x1280 movie with letterboxing

The following illustration shows the original 1920x1080 source cropped to 1920x816.

Example showing the same movie, cropped to 1920x816

The following illustration shows the source scaled horizontally to 1440 x 816.

Example showing the same movie, scaled horizontally to

The majority of consumer digital HD displays in homes today are 1280 pixels wide or less, and most analog displays can not resolve 1440 lines wide, so this anamorphic compression does not have any apparent quality loss in most cases. Encoding at the full 1920 pixel width only offers higher quality with high resolution computer monitors (at least 1920x1200) or the recently introduced 1080p displays.

Back to the top of this pageBack to the top


Frame Rates

When encoding HD, you'll always want to use the exact frame rate as your source. Using a lower frame rate will result in choppy motion, and a higher frame rate wastes bandwidth.

Note that the PAL frame rates of 25 and 50 are whole numbers, but NTSC-derived frame rates are actually 0.1% slower. So, 60 is actually 60/1001 (abbreviated as 59.94), 30 is actually 30/1001 (abbreviated as 29.97), and 24 is actually 24/1001 (abbreviated as 23.976). Windows Media can handle any of these modes, but be sure to match the frame rate of the source precisely. Anything captured off of tape would use the 0.1 percent lower values, but something rendered out of an application like After Effects could go either way.

One special case of frame rate is telecined content. Film runs at 24 progressive frames per second (fps) (24p), and NTSC video at 29.97 interlaced fps. When film is converted to video, it is first slowed down to 23.976, and then converted in a pattern called 3:2 pulldown. With pulldown, the first frame of film becomes three fields of video, the second becomes two fields, the third frame becomes three fields, the fourth becomes two fields, and so on. This is easy to detect when you go through the source one frame at a time. You will see a repeating pattern of three progressive frames followed by two interlaced frames. With telecined source like this, an inverse telecine must be applied to get the best results. Inverse telecine reverses the telecine process, turning the video back into a 23.976 fps progressive sequence. Since 24p encodes and plays back so much better than 60i, this is a big, big win for compression. Also, the 24p playback has smoother, more accurate motion than the telecined 60i.

Back to the top of this pageBack to the top


Progressive and Interlaced

Windows Media Video 9 codecs support interlaced video. However, both compression efficiency and playback performance are substantially better with progressive encoding. It is preferable to work with progressive sources when possible. And of course telecined source should be inverse-telecined. But when dealing with true 1080/60i sources, a full 1080/60i WMV file can be very difficult to play back on anything but Windows Media Player 10 on a very recent machine. Converting from 1080/60i to 720/60p might be the best option. Motion smoothness will be preserved, and image quality will remain good, while decode complexity is substantially reduced.

Back to the top of this pageBack to the top


Key Frames

A key frame (known in MPEG as an I-frame) is a self-contained frame. Most frames in a WMV file will be delta frames, which are based on the previous frame. However, it is important to have periodic key frames, in order to provide good random access, and quick recovery from a playback glitch.

Web video tends to have infrequent key frames, like one every 10 seconds, because compression efficiency is paramount. In contrast, HD tends to have a lot more key frames. This is because playback performance is as much of a challenge in HD as compression efficiency. A key frame every two to four seconds is a good starting point for an HD WMV file.

Back to the top of this pageBack to the top


Decoder Complexity

One of the rarely discussed parameters in Windows Media Video encoding is decoder complexity. This sets a constraint on internal parameters in the codec, which affects the decoding performance. For HD encoding today, you'll want to set it to Main, instead of Auto, which is the default. That will provide the best combination of quality and playback speed.

Back to the top of this pageBack to the top


Audio

Even though audio might only be 5 percent of the bits in a HD file, it's half the experience. And Windows Media offers some deep functionality to provide incredible sounding soundtracks to HD content.

Codecs
Windows Media technologies support a variety of audio codecs, but only three are applicable to HD WMV files. For more information about these and other Windows Media Audio codecs, see Windows Media Audio 9 Series Codecs.

Windows Media Audio 9

The Windows Media Audio 9 codec has been in Windows Media for the better part of a decade now. It sounds excellent at higher bit rates. Its significant limitation is that it is limited to mono or stereo audio, to 48 kilohertz (kHz) maximum, and 16-bit.

At medium bit rates, the Windows Media Audio 9 codec offers both CBR and VBR encoding modes. At very low bit rates, it is CBR only.

Windows Media Audio 9 Professional

For content that has more than stereo channels, more than 48 kHz audio, or more than 16 bits of resolution, Windows Media Audio 9 Professional provides a full set of features for "high definition" audio. Although its minimum bit rate of 128 Kbps is not a good fit for many web applications, it's fine for HD. This codec produces better sound than Windows Media Audio 9 codec at the same data rate, so it should be used for most HD projects.

Windows Media Audio 9 Lossless

Windows Media Audio 9 Lossless provides lossless encoding of audio content. This codec can reduce the size of the file by about 50 percent, compared to the same uncompressed audio, depending on how complex the audio is. Lossless supports the same full range of modes as the Windows Media Audio 9 Professional codec. Since the bit rate is not controlled, the lossless codec is mainly useful for archiving. It is possible to use it for delivery, certainly, although it's almost always overkill. Keep in mind that a full 7.1, 96 KHz, 24-bit, lossless encode will be approximately 9 megabits per second.

Sample Rate
Most sources today are 44.1 or 48 kHz, with some "high definition" audio at 88.2 or 96 kHz. When making HD content, it's best to just stick with the source sample rate. With HD data rates, there rarely is the need to save bits by reducing sample rate, and increasing the sample rate will not improve the quality.

Bit Depth
16-bit audio is CD quality, and that is usually enough for most listeners. But if more than 16-bit source is available, like 20-bit or 24-bit, the Windows Media Audio 9 Professional codec offers modes to encode and decode up to 24 bits. The stereo modes of Windows Media Audio 9 Professional only support 24-bit, but you can encode 16-bit or 20-bit sources as 24-bit without any problems.

Channels
Traditionally, digital audio has been mono or stereo. If that's what the source is, keep it that way. If multi-channel audio is available, like 5.1 or 7.1, use it. Even if the playback computer or device doesn't support multi-channel, the player will gently convert the audio down to whatever speakers are available, all the way down to mono if need be.

Note that the Windows Media Encoder 9 Series and most tools do not actually expose a user interface for 7.1 audio encoding, but 5.1 is usually available. The number of consumers that have 7.1 systems is still very small.

Data Rate
Given the huge data rates for HD video, audio data rates are rarely more than 5-10 percent of the file. 440 Kbps is typical for 5.1 48 kHz content. Also, audio data rates do not have a significant impact on decoding performance. Thus, there is rarely a need to use low bit rates for HD audio. Always use a high enough bit rate that there are no audible encoding artifacts. A good starting point with the Windows Media Audio 9 Professional codec is to find the lowest data rate that provides the combination of sample rate, bit depth, and channels you want, and then pick the data rate one higher than that. Still, encoding with the Windows Media Audio 9 Professional codec doesn't provide any combinations that will sound bad to the casual listener.

Back to the top of this pageBack to the top


Encoding Computers

It is important to consider the computer processor, the amount of RAM, and the operating system on the computer you will be using to encode HD.

Processors
All things being equal, compression speed is largely dependent on the speed of the processor. Note that this isn't speed in gigahertz—measuring processor performance purely by clock speed is like trying to figure out how fast a car can go by looking at the revolutions per minute (RPMs).

Today's fastest, single-CPU computers use dual-core technology (either AMD or Intel). The performance available today is much better than what it was just a couple of years ago. It generally doesn't pay to try to limp along with an older computer for HD encoding.

The Windows Media Encoder 9 Series can use up to two processors for audio and up to four for video to efficiently distribute the encoding load. The good news is that dual-processor, dual-core systems are becoming very competitively priced, and offer four real processors for encoding. Given the cost of the other equipment in a digital media workstation, spending an extra $1,000 to have a second processor is well worth it for increasing encoding speed—typically around a 60 percent performance improvement.

And remember that CPU speed isn't everything for performance. Speed of storage, front-side bus, and other factors can all play a big part in performance.

RAM
The rule for RAM is that you need to have enough, but enough is enough. If the operating system ever runs out of RAM, the machine will slow to a crawl as virtual memory is engaged, using the hard drive for memory. However, as long as enough memory is available, adding more will not help the speed at all.

This is simple to monitor. Simply run the Windows Task Manager and watch memory use. As long as at least 20% of the memory is free at all points during an encoding session, there is enough RAM in the system. If the free memory drops below 20%, consider adding some more.

HD encoding, of course, uses a lot more RAM. ProCoder has simultaneous rendering, which can also use a lot of RAM. For complex ProCoder HD projects with multiple simultaneous outputs, I've needed as much as 2 gigabytes (GB) of RAM.

RAM speed matters as well. A modern compression workstation should use at least dual-channel double data rate (DDR) memory. Compression is very sensitive to memory bandwidth, so having fast RAM can matter more than having a fast processor.

Operating System
For WMV encoding, the Microsoft Windows XP operating systems are the definite operating system of choice. While many of the tools will run on Windows 98 and higher, Windows XP provides better performance on the same hardware, especially with multiple processors, or hyperthreaded processors, like Intel's Pentium 4. Also, the latest Windows Media codecs are only available on Windows XP.

Back to the top of this pageBack to the top


Additional Information

A note on the Windows XP Professional x64 edition operating system
There has been a lot of buzz about the performance gains of Windows XP Professional x64 edition. This operating system runs in the native 64-bit mode supported by current workstation-class chips from Intel and AMD. For some applications, running in 64-bit optimized mode on the 64-bit OS, speed gains of 10-30 percent can be seen, mainly due to more efficient memory use and more CPU registers.

However, this isn't a mainstream solution for WMV encoding yet, and maybe not for a while. While there is the Windows Media Encoder 64-bit Edition, most of the source codecs that would be used with the content aren't available under 64-bit.  All other compression tools are still only 32-bit.

A few years from now, it could be a 64-bit world. But for now, 32-bit for digital media authoring is recommended.

Windows Media Video 9 Advanced Profile
The next big thing for Windows Media is the new Windows Media Video 9 Advanced Profile codec. This codec is an enhanced version of the Windows Media Video 9 codec, with new features that can be used in more traditional video applications, as well as improve compression efficiency in all applications.

The Windows Media Video 9 Advanced Profile codec is the future of Windows Media for the consumer electronics devices. Microsoft has been approved by the Society of Motion Picture and Television Engineers (SMPTE) as the proposed VC-1 standard. This made the details of the Windows Media Video 9 bit stream available to all, a big help for those interested in building interoperable encoders and players. VC-1 isn't just Advanced Profile, though—it also includes support for the Simple and Main profiles, which the original Windows Media Video 9 encoder creates and players decode. For more information about VC-1 and the Windows Media Video 9 Advanced Profile codec, see VC-1 Technical Overview.

There are two main features in the advanced profile that make it a big improvement for use in the video industry. First, it has improved tools for interlaced content, making interlaced content almost as easy to encode as progressive content. This increases the compression efficiency advantage of Windows Media Video 9 over the legacy MPEG-2 used in most digital broadcasting today. Second, it has better support for a variety of transport protocols. For example, there is a SMPTE proposal for putting the Windows Media Video 9 Advanced Profile codec inside a MPEG-2 transport stream, meaning that legacy server and routing products will be able to move around Windows Media Video 9 Advanced Profile codec—only the encoders and decoders would need to be upgraded.

Today, the WMV9 Advanced Profile codec is made available by installing the Windows Media Format software development kit (SDK) 9.5. Microsoft will be releasing an updated codec with full VC-1 compliance and many quality and performance enhancements soon.

Back to the top of this pageBack to the top



© 2016 Microsoft Corporation. All rights reserved. Contact Us |Terms of Use |Trademarks |Privacy & Cookies
Microsoft