So far, this series of articles has focused on how to get source content to an encoder. The first two articles, Zero to HD in 60 Seconds and Understanding HD Formats, describe important concepts about high definition (HD) formats
and storage needs. The last article, Desktop
HD Capture Solutions, described common HD video capture issues. This article addresses compression and how to encode
the file itself.
High definition encoding comes with several constraints. Not only are we constrained by the sheer size of our files, but
also by the power of the device that plays the media. In addition, HD encoding, especially at a 1920x1080 frame size (also
referred to as 1080), is very challenging to play back on older computers. So, it becomes a balancing act between
providing a good visual experience and providing reliable playback.
Back to the top
There are several encoding modes available in Windows Media Encoder 9 Series, and picking the right one for a particular
project can be a challenge. Fortunately, there are only two appropriate for HD encoding.
Constant Bit Rate
Constant bit rate (CBR) encoding is required for real-time streaming. It produces a file that has an average bit rate that
does not exceed the buffer duration. So, in a 1000 kilobits per second (Kbps) file with a five-second buffer, any arbitrary
five seconds from the file will have a bit rate of 1000 Kbps or less. Not only is CBR required for streaming, it is
also the best choice when the decode speed, rather than the file size, is the limiting factor.
CBR supports both one-pass and two-pass encoding. For Web data rates, if at all possible, use two-pass encoding. This lets
the codec have a complete measurement of all the frames in the file, enabling it to provide optimal distribution of bits.
For the purposes of HD, however, use one-pass CBR. A one-pass CBR encode estimates how the video will change in the future,
and sometimes that estimate is wrong. With the high data rates of HD, the estimate that one-pass CBR gives is actually better
than two-pass CBR with today's Windows Media Video 9 codec. However, future Microsoft codecs will support other modes as
While CBR is a necessity for real-time streaming, real-time streaming is in its early days for HD. Most HD content is still
being distributed and played as downloaded files. These files play back fine using CBR, but CBR is less compression-efficient
than other methods for file-based playback, because the data rate for the easy portions of the videosections with
very little movement or change in the pictureends up being a higher data rate than necessary. However, since CBR
doesn't have peaks of higher data rates, like variable bit rate (VBR) does, it is still more efficient for playback. For
example, a computer can play back a CBR file with a higher average data rate more efficiently than it can playback a VBR
file at the same data rate. The next article in this series will cover real-time encoding.
Peak Limited VBR
There are three different variable bit rate (VBR) modes in the Windows Media Encoder, but theoretically, only peak bit rate-based
VBR is appropriate for HD encoding. However, the current implementation has some limitations for HD.
The basic VBR mode is the "unconstrained" mode called bit rate-based VBR. With this mode, only an average bit rate is specified.
The number of bits of each frame will be proportional to how difficult it is to encode. If the complexity is very constant,
the bit rate will also be very constant. If the content is highly variable, the bit rate will be more variable. This is
a drawback to unconstrained VBR because it can produce very high peak bit rates that can overwhelm the decoding process,
leading to dropped frames.
For HD content, where playback performance is the paramount concern, the optimum file-based encoding scheme is CBR. However,
in cases where both playback performance and file size are both of concern, Constrained VBR is an option. This is like unconstrained
VBR, but with the buffer limitation options of CBR. Thus, an average bit rate is specified as well as a peak bit rate, defined
as a maximum bit rate and duration. For example, a file can be encoded with an average bit rate of 5000 Kbps, with
a peak bit rate of 5 seconds of 9000 Kbps. That file would play wherever a CBR file at 9000 Kbps and a 5 second
buffer would play, but the file would be as large as a CBR file at 5000 Kbps.
The other VBR mode is quality-based VBR. Unlike the other modes, a quality value is specified instead of a bit rate. The
encoder gives each frame as many or as few bits as it needs to meet the quality target. Because quality-based VBR does not
have any peak constraints, it is not appropriate for HD delivery because it generally isn't capable of real-time playback.
Back to the top
There are two data rates that go into any peak limited VBR encode: average and peak. Average is the average
of the whole file, and so determines the final file size. Peak is the highest allowed data rate over the buffer duration.
These are two different values, and there isn't any real connection between them. If you are encoding a movie for a device
that supports a 9000 megabit per second (Mbps) maximum peak, the average bit rate could be 2000 Mbps, 7000 Mbps,
or anything else, depending on how much content needs to fit on the media. The average bit rate determines the average quality,
and the highest average bit rate that still allows the content to fit should be used. If there is enough space that the
same value can be used for average and peak, just use CBR instead.
Peak data rate will not vary from project to project. All files that are encoded for a particular specification or device
will use the exact same peak values. Peak determines the quality in the hardest parts of the content, such as very fast,
complex motion. Raising the peak data rate won't change the appearance of most of the content, just the hardest scenes.
With some content, the codec might not even use the full peak bit rate range.
A great new tool for analyzing encoded WMV files is in development. It is called WMSnoop from Sliq Media Technologies, and
it enables you to see how the bit rate gets distributed, as you can see in the following illustration.
Back to the top
Frame size is, of course, what makes HD high definition. HD can be defined as anything bigger than the biggest standard
definition, which is the 720x576 frame size that PAL uses. The standard HD sizes for production are 1920x1080 (also referred
to as 1080) and 1280x720 (also referred to as 720). However, one great thing about Windows Media is that it
supports arbitrary frame sizes. The only requirement is that the height and width must be even numbers.
Even though the 16:9 frame size of HD means that less letterboxing is required when showing film content, a lot of HD content
will have letterboxing (black bars at the top and bottom of the frame). But since WMV files can be any shape, there is no
need to include the letterboxing while encoding. Instead, the black bars should be cropped out and the frame size reduced
correspondingly. So, when you have a 1280x720 file containing a letterboxed 2.35:1 movie, you can leave out the letterboxing
when you encode to a WMV file, and the results will be a 1280x544 movie. This provides all the pixels from the source, but
will be substantially easier to play back.
Encoding content that is 720 is normally done using square pixels. Encoding content that is 1080, using Windows Media, is
normally anamorphic, with the width being squeezed down from 1920 to 1440, while the height remains the same. By setting
the pixel aspect ratio value, the file will be automatically stretched to the correct aspect ratio on playback. In the 2.35:1
example above, a 1920x1080 movie would be cropped to 1920x816, and then scaled horizontally to 1440x816. Then when it is
played back, it will be scaled to match the frame size.
The following illustration shows a 2.35:1 movie at 1920x1080.
The following illustration shows the original 1920x1080 source cropped to 1920x816.
The following illustration shows the source scaled horizontally to 1440 x 816.
The majority of consumer digital HD displays in homes today are 1280 pixels wide or less, and most analog displays
can not resolve 1440 lines wide, so this anamorphic compression does not have any apparent quality loss in most cases.
Encoding at the full 1920 pixel width only offers higher quality with high resolution computer monitors (at least 1920x1200)
or the recently introduced 1080p displays.
Back to the top
When encoding HD, you'll always want to use the exact frame rate as your source. Using a lower frame rate will result in
choppy motion, and a higher frame rate wastes bandwidth.
Note that the PAL frame rates of 25 and 50 are whole numbers, but NTSC-derived frame rates are actually 0.1% slower. So,
60 is actually 60/1001 (abbreviated as 59.94), 30 is actually 30/1001 (abbreviated as 29.97), and 24 is actually 24/1001
(abbreviated as 23.976). Windows Media can handle any of these modes, but be sure to match the frame rate of the source
precisely. Anything captured off of tape would use the 0.1 percent lower values, but something rendered out of an application
like After Effects could go either way.
One special case of frame rate is telecined content. Film runs at 24 progressive frames per second (fps) (24p), and NTSC
video at 29.97 interlaced fps. When film is converted to video, it is first slowed down to 23.976, and then converted in
a pattern called 3:2 pulldown. With pulldown, the first frame of film becomes three fields of video, the second becomes
two fields, the third frame becomes three fields, the fourth becomes two fields, and so on. This is easy to detect when
you go through the source one frame at a time. You will see a repeating pattern of three progressive frames followed by
two interlaced frames. With telecined source like this, an inverse telecine must be applied to get the best results. Inverse
telecine reverses the telecine process, turning the video back into a 23.976 fps progressive sequence. Since 24p encodes
and plays back so much better than 60i, this is a big, big win for compression. Also, the 24p playback has smoother, more
accurate motion than the telecined 60i.
Back to the top
Windows Media Video 9 codecs support interlaced video. However, both compression efficiency and playback performance
are substantially better with progressive encoding. It is preferable to work with progressive sources when possible. And
of course telecined source should be inverse-telecined. But when dealing with true 1080/60i sources, a full 1080/60i WMV
file can be very difficult to play back on anything but Windows Media Player 10 on a very recent machine. Converting
from 1080/60i to 720/60p might be the best option. Motion smoothness will be preserved, and image quality will remain good,
while decode complexity is substantially reduced.
Back to the top
A key frame (known in MPEG as an I-frame) is a self-contained frame. Most frames in a WMV file will be delta frames, which
are based on the previous frame. However, it is important to have periodic key frames, in order to provide good random access,
and quick recovery from a playback glitch.
Web video tends to have infrequent key frames, like one every 10 seconds, because compression efficiency is paramount.
In contrast, HD tends to have a lot more key frames. This is because playback performance is as much of a challenge in HD
as compression efficiency. A key frame every two to four seconds is a good starting point for an HD WMV file.
Back to the top
One of the rarely discussed parameters in Windows Media Video encoding is decoder complexity. This sets a constraint on
internal parameters in the codec, which affects the decoding performance. For HD encoding today, you'll want to set it to
Main, instead of Auto, which is the default. That will provide the best combination of quality and playback speed.
Back to the top
Even though audio might only be 5 percent of the bits in a HD file, it's half the experience. And Windows Media offers
some deep functionality to provide incredible sounding soundtracks to HD content.
Windows Media technologies support a variety of audio codecs, but only three are applicable to HD WMV files. For more information
about these and other Windows Media Audio codecs, see
Windows Media Audio 9 Series Codecs.
Windows Media Audio 9
The Windows Media Audio 9 codec has been in Windows Media for the better part of a decade now. It sounds excellent at higher
bit rates. Its significant limitation is that it is limited to mono or stereo audio, to 48 kilohertz (kHz) maximum,
At medium bit rates, the Windows Media Audio 9 codec offers both CBR and VBR encoding modes. At very low bit rates, it is
Windows Media Audio 9 Professional
For content that has more than stereo channels, more than 48 kHz audio, or more than 16 bits of resolution, Windows
Media Audio 9 Professional provides a full set of features for "high definition" audio. Although its minimum bit rate of
128 Kbps is not a good fit for many web applications, it's fine for HD. This codec produces better sound than Windows Media
Audio 9 codec at the same data rate, so it should be used for most HD projects.
Windows Media Audio 9 Lossless
Windows Media Audio 9 Lossless provides lossless encoding of audio content. This codec can reduce the size of the file by
about 50 percent, compared to the same uncompressed audio, depending on how complex the audio is. Lossless supports
the same full range of modes as the Windows Media Audio 9 Professional codec. Since the bit rate is not controlled,
the lossless codec is mainly useful for archiving. It is possible to use it for delivery, certainly, although it's almost
always overkill. Keep in mind that a full 7.1, 96 KHz, 24-bit, lossless encode will be approximately 9 megabits per
Most sources today are 44.1 or 48 kHz, with some "high definition" audio at 88.2 or 96 kHz. When making HD content,
it's best to just stick with the source sample rate. With HD data rates, there rarely is the need to save bits by reducing
sample rate, and increasing the sample rate will not improve the quality.
16-bit audio is CD quality, and that is usually enough for most listeners. But if more than 16-bit source is available,
like 20-bit or 24-bit, the Windows Media Audio 9 Professional codec offers modes to encode and decode up to 24 bits.
The stereo modes of Windows Media Audio 9 Professional only support 24-bit, but you can encode 16-bit or 20-bit sources
as 24-bit without any problems.
Traditionally, digital audio has been mono or stereo. If that's what the source is, keep it that way. If multi-channel audio
is available, like 5.1 or 7.1, use it. Even if the playback computer or device doesn't support multi-channel, the player
will gently convert the audio down to whatever speakers are available, all the way down to mono if need be.
Note that the Windows Media Encoder 9 Series and most tools do not actually expose a user interface for 7.1 audio
encoding, but 5.1 is usually available. The number of consumers that have 7.1 systems is still very small.
Given the huge data rates for HD video, audio data rates are rarely more than 5-10 percent of the file. 440 Kbps
is typical for 5.1 48 kHz content. Also, audio data rates do not have a significant impact on decoding performance.
Thus, there is rarely a need to use low bit rates for HD audio. Always use a high enough bit rate that there are no audible
encoding artifacts. A good starting point with the Windows Media Audio 9 Professional codec is to find the lowest data
rate that provides the combination of sample rate, bit depth, and channels you want, and then pick the data rate one higher
than that. Still, encoding with the Windows Media Audio 9 Professional codec doesn't provide any combinations that
will sound bad to the casual listener.
Back to the top
It is important to consider the computer processor, the amount of RAM, and the operating system on the computer you will
be using to encode HD.
All things being equal, compression speed is largely dependent on the speed of the processor. Note that this isn't speed
in gigahertzmeasuring processor performance purely by clock speed is like trying to figure out how fast a car can
go by looking at the revolutions per minute (RPMs).
Today's fastest, single-CPU computers use dual-core technology (either AMD or Intel). The performance available today is
much better than what it was just a couple of years ago. It generally doesn't pay to try to limp along with an older computer
for HD encoding.
The Windows Media Encoder 9 Series can use up to two processors for audio and up to four for video to efficiently distribute
the encoding load. The good news is that dual-processor, dual-core systems are becoming very competitively priced, and offer
four real processors for encoding. Given the cost of the other equipment in a digital media workstation, spending an extra
$1,000 to have a second processor is well worth it for increasing encoding speedtypically around a 60 percent
And remember that CPU speed isn't everything for performance. Speed of storage, front-side bus, and other factors can all
play a big part in performance.
The rule for RAM is that you need to have enough, but enough is enough. If the operating system ever runs out of RAM, the
machine will slow to a crawl as virtual memory is engaged, using the hard drive for memory. However, as long as enough memory
is available, adding more will not help the speed at all.
This is simple to monitor. Simply run the Windows Task Manager and watch memory use. As long as at least 20% of the memory
is free at all points during an encoding session, there is enough RAM in the system. If the free memory drops below 20%,
consider adding some more.
HD encoding, of course, uses a lot more RAM. ProCoder has simultaneous rendering, which can also use a lot of RAM. For complex
ProCoder HD projects with multiple simultaneous outputs, I've needed as much as 2 gigabytes (GB) of RAM.
RAM speed matters as well. A modern compression workstation should use at least dual-channel double data rate (DDR) memory.
Compression is very sensitive to memory bandwidth, so having fast RAM can matter more than having a fast processor.
For WMV encoding, the Microsoft Windows XP operating systems are the definite operating system of choice. While many
of the tools will run on Windows 98 and higher, Windows XP provides better performance on the same hardware, especially
with multiple processors, or hyperthreaded processors, like Intel's Pentium 4. Also, the latest Windows Media codecs
are only available on Windows XP.
Back to the top
A note on the Windows XP Professional x64 edition operating
There has been a lot of buzz about the performance gains of Windows XP Professional x64 edition. This operating system
runs in the native 64-bit mode supported by current workstation-class chips from Intel and AMD. For some applications, running
in 64-bit optimized mode on the 64-bit OS, speed gains of 10-30 percent can be seen, mainly due to more efficient memory
use and more CPU registers.
However, this isn't a mainstream solution for WMV encoding yet, and maybe not for a while. While there is the Windows Media
Encoder 64-bit Edition, most of the source codecs that would be used with the content aren't available under 64-bit.
All other compression tools are still only 32-bit.
A few years from now, it could be a 64-bit world. But for now, 32-bit for digital media authoring is recommended.
Windows Media Video 9 Advanced Profile
The next big thing for Windows Media is the new Windows Media Video 9 Advanced Profile codec. This codec is an enhanced
version of the Windows Media Video 9 codec, with new features that can be used in more traditional video applications,
as well as improve compression efficiency in all applications.
The Windows Media Video 9 Advanced Profile codec is the future of Windows Media for the consumer electronics devices.
Microsoft has been approved by the Society of Motion Picture and Television Engineers (SMPTE) as the proposed VC-1 standard.
This made the details of the Windows Media Video 9 bit stream available to all, a big help for those interested in building
interoperable encoders and players. VC-1 isn't just Advanced Profile, thoughit also includes support for the Simple
and Main profiles, which the original Windows Media Video 9 encoder creates and players decode. For more information
about VC-1 and the Windows Media Video 9 Advanced Profile codec, see VC-1 Technical Overview.
There are two main features in the advanced profile that make it a big improvement for use in the video industry. First,
it has improved tools for interlaced content, making interlaced content almost as easy to encode as progressive content.
This increases the compression efficiency advantage of Windows Media Video 9 over the legacy MPEG-2 used in most digital
broadcasting today. Second, it has better support for a variety of transport protocols. For example, there is a SMPTE proposal
for putting the Windows Media Video 9 Advanced Profile codec inside a MPEG-2 transport stream, meaning that legacy server
and routing products will be able to move around Windows Media Video 9 Advanced Profile codeconly the encoders and
decoders would need to be upgraded.
Today, the WMV9 Advanced Profile codec is made available by installing the Windows Media Format software development kit
(SDK) 9.5. Microsoft will be releasing an updated codec with full VC-1 compliance and many quality and performance enhancements
Back to the top