Producing Multichannel Audio with Windows Media 9 Series
Jennifer Winters
Microsoft New Media Platforms Division
November 2002
Introduction
Using Microsoft Windows Media Encoder 9 Series and the Windows Media Audio 9 Professional codec, you can encode multichannel audio; this means you can encode audio for surround sound playback in six (5.1 audio) or eight (7.1 audio) channels. The format is specifically designed for CD, DVD, high-definition television, and digital cinema audio programs. Content encoded with the codec is fully optimized for streaming or download-and-play delivery at bit rates of 128 kilobits per second (Kbps) to 768 Kbps. The codec can encode the following formats:
Sampling rates: 44.1, 48, 88.2, or 96 kilohertz (kHz)
To capture multichannel audio from a file, your content source must be one of the following:
A single 6-channel or 8-channel file that has a WAVE_FORMAT_EXTENSIBLE format. This format is required when sourcing from a single WAV file, because the format uses a channel mask to specify the channels.
Six mono channel WAV files. This option is not available with 7.1 audio. Each file cannot exceed 2 gigabytes (GB) in size.
Abstract
This document provides detailed information about the steps involved in producing multichannel audio in Windows Media Format. The document lists supported multichannel audio sources for Microsoft® Windows Media® Encoder, explains how you can convert your audio into one of the formats, then provides guidelines to follow when setting up an encoding session to ensure you end up with high-quality multichannel audio. This document also provides information about the configuration required for a user to listen to multichannel audio, and explains the playback behavior for users who do not have the required configuration. Finally, this document describes dynamic range control, a new feature in Windows Media 9 Series that you can use to limit the difference between the softest and loudest sound in a piece of content.
The target audience for this document is professional audio engineers, musicians, DVD authoring professionals, film/video producers, and anyone interested in multichannel audio. It is assumed that readers understand audio fundamentals and encoding basics.
An AVI file. The file can be audio-only or contain both audio and video. Audio-only AVI files are useful because they do not have the file size limitation of WAV files. For multichannel use, the AVI file must have the WAVE_FORMAT_EXTENSIBLE audio header.
You can source audio and video from separate files.
To prepare audio for encoding, you must first export the audio from your audio editing program. If your program supports exporting to the WAVE_FORMAT_EXTENSIBLE format, you are ready to begin encoding. If this is not supported, however, you can save the audio to six mono channel files (referred to as "mono stubs" in most programs), one for each channel. Then, when you set up your source in the encoder, you can specify which channel each file is associated with. The following programs have been tested with the encoder:
Steinberg Nuendo
CoolEdit Pro
ProTools
You can also capture multichannel audio from a tape or live audio from a capture card. To do so, you must use a computer running Microsoft Windows® XP, and you must have a capture card that has Windows Driver Model (WDM) drivers. The following capture cards have been tested with the encoder:
Use the following guidelines when setting up an encoding session.
Sampling rate and bit depth of encoded content should match the source. Or, if you change the sampling rate, avoid non-integer conversions. For example, converting from 88.2 kHz to 44.1 kHz is okay, but converting from 96 kHz to 44.1 kHz will produce suboptimal results. If you have 20-bit source files, you should select a 24-bit audio format in the encoder.
Choose a bit rate and encoding mode to match your audience. Supported bit rates range from 128 Kbps to 768 Kbps. All encoding modes are supported. This includes one- and two-pass constant bit rate (CBR), quality-based variable bit rate (VBR), bit rate-based VBR, and peak bit rate-based VBR.
Use the Windows Media Audio 9 Professional codec. You can also use the Windows Media Audio 9 Lossless codec if you are encoding the content for archival purposes.
Check that the encoding system is sufficiently powerful. There are no unique hardware recommendations when you are capturing from and encoding multichannel audio to a file. However, if you are sourcing multichannel audio from a capture card, it is recommended that you use dual 733 MHz processors or higher, such as an Intel Pentium III or AMD Athlon MP.
A computer running Microsoft Windows XP; Dual 533 MHz or higher processors are recommended for higher bit rates.
A player that is based on the Windows Media Format 9 Series Software Development Kit (SDK).
A multichannel sound card with WDM drivers (for example, SoundBlaster 5.1, Audigy, Echo Layla, or the Delta Series from M-Audio).
A 5.1 or 7.1 speaker configuration.
Users who do not have the above configuration will still be able to hear the audio, because the audio will be folded down automatically to two channels for stereo speakers. In addition, the audio will be folded down to two channels when the audio is being copied to a portable device or to a CD. If you are encoding 5.1 audio, you can control the fold-down distribution between the surround, center, and subwoofer channels. You can use Windows Media Encoder to control the distribution before encoding, or you can use Windows Media File Editor to control the distribution after encoding. The default fold-down distribution is -3 decibels for the surround and center channels, and -12 decibels for the subwoofer (LFE) channel. To prevent clipping, the resulting stereo volume is normalized to the sum of all channels.
During playback, the audio is automatically re-sampled to match the capabilities of the audio card. For example, the codec will re-sample from 96 kHz to 48 kHz if necessary, or from 88.2 kHz to 44.1 kHz. In addition, the bit depth will be re-quantized to 16 bits, if required by the sound card.
Using the Surround Sound Mode to Play Multichannel Audio
If a user has a two-channel sound card, but also has a Dolby Pro Logic-style decoder downstream, the user can enable the Surround Sound mode to get four-channel output from the audio system (left/right/center/rear). This configuration is popular with many game systems. To enable the mode, the user simply sets the audio properties to Surround Sound Speakers in the Control Panel. After that, the decoder matrix encodes the 5.1 channels onto the stereo output, using a left/right-style mix.
The peak and average values of the audio signal are calculated during encoding, and those values are placed in the header of the Windows Media file. During file playback, users can limit the difference between the softest and loudest sounds (the dynamic range) in the file by using the Quiet Mode feature in the Player. This is useful, for example, for movie content that has a wide dynamic range; a user can limit the maximum loudness while maintaining voice intelligibility. (This feature is only available when the file is played on a computer running Microsoft Windows XP, and using a player that is built on the Windows Media Format SDK.)
The Quiet Mode feature of the Player has three settings that affect dynamic range: off, little, and medium. By default, the settings affect the audio dynamic range during playback as follows:
Off. If the user has not turned the Quiet Mode feature on, then content is played in full dynamic range.
Little difference. The peak value of the audio signal is limited to 6 decibels above the average level.
Medium difference. The peak value of the audio signal is limited to 12 decibels above the average level.
When you use Windows Media File Editor to edit the file, you can specify different peak and average values than those that were calculated during encoding. Typically, it is recommended that you only adjust the peak value. Adjusting the average value will not compress the difference between loud and soft sounds. Instead, it will cut or boost the overall average volume of the entire piece, which may produce undesirable distortion during playback. Changing the peak value affects the following changes to the settings of the Quiet Mode feature.
Off. If the user has not turned the Quiet Mode feature on, then content is played in full dynamic range.
Little difference. The peak value of the audio signal is limited to the median of the peak and average values you specified in Windows Media File Editor.
Medium difference. The peak value of the audio signal is limited to the peak value you specified in Windows Media File Editor.
The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.
This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, AS TO THE INFORMATION IN THIS DOCUMENT.
Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.
Microsoft, MS-DOS, Windows, Windows Media, Windows NT, ActiveSync, ActiveX, Direct3D, DirectDraw, DirectInput, DirectMusic, DirectPlay, DirectShow, DirectSound, DirectX, FrontPage, JScript, Microsoft Press, MSN, NetShow, Outlook, PowerPoint, SQL Server, Visual Basic, Visual C++, Visual InterDev, Visual J++, Visual Studio, WebTV, Win32, and Win32s are either registered trademarks or trademarks of Microsoft Corporation in the U.S.A. and/or other countries.
The names of actual companies and products mentioned herein may be the trademarks of their respective owners.