A production workstation is a system of hardware and software components you can use to create digital media. For example, a workstation for creating Microsoft® Windows Media®-based
content may be as simple as a computer and video camera for streaming live traffic conditions. More complex workstations can be part of a complete audio and video production facility. The type
of production workstation you put together depends on your needs, your desired digital media quality, and your budget.
This article describes what you need to build a workstation for capturing and encoding high-quality, high-bandwidth files that can play back full frame at the original frame rate. With Windows
Media Encoder and Windows Media Encoding Script, you can create high-bandwidth files that can be downloaded or streamed over a broadband network. With
Feedback
E-mail us with your comments and feedback
about this article.
Abstract
With the Microsoft Windows Media Audio and Video 9 codecs, Windows Media Encoder 9 Series, and Windows Media Capture 9 Series, you can create high-quality Windows Media-based audio and video
content. To take full advantage of the quality and features available with Windows Media, you need an encoding workstation capable of capturing and playing high-bandwidth video. This article
describes an optimal setup for creating high-quality, high-bandwidth content.
Windows Media Capture 9 Series you can
create uncompressed AVI files with which you can create Windows Media files in the encoder. The features and codecs in these capture and encoding tools enable you to create content that approaches
DVD quality at a fraction of the bit rate and file size.
To capture and encode high-quality content, a workstation must be capable of high-quality analog-to-digital conversion. In addition, the computer must have the speed and memory to handle far
more than the typical number of bits per second. The initial hardware investment may be higher than the cost of a low-bandwidth system, but the results are well worth it if you need high quality.
This article provides recommendations for a minimum and an optimal system for encoding Windows Media-based content. The recommendations also take into account system usage; that is, a computer
used to capture video must have a great deal more speed and memory than one used for file-to-file conversions. For detailed recommendations, see the Hardware Recommendations
section of this article.
To take advantage of the high-bandwidth features and codecs, you need to be able to capture audio and video at its highest quality. A computer that can capture a small image size at 15 frames
per second may not be able to handle video with four times the image size and twice the frame rate.
When video is converted from analog to digital, each frame is broken up into hundreds of pixels. Each pixel is represented by one or more bytes that represent the color of that small area of
the image. Each video conversion requires a certain amount of computer memory and computation time. The higher the quality of video, the greater the number of frames per second; the larger the
image size, the greater the number of pixels that must be converted in a given period of time. For example, when capturing video at 30 frames per second, the computer must not only be able to
handle many pixel conversions, it must also perform many conversions very quickly in order to keep up with the continuous stream of video.
The computer must also be able to handle the audio conversion simultaneously. The smallest unit of audio is a called a sample. High-quality audio requires more samples per second, plus a computer
that has the speed and memory to process the continuous stream.
To encode directly to a Windows Media file or broadcast stream from audio and video capture cards, you will typically use Windows Media Encoder. However, you can perform basic encoding tasks
from the command line with Windows Media Encoding Script. In the optimal system, rather than capture and encode in one step, an AVI file is first captured with Windows Media Capture 9 Series
utility. Then the final Windows Media Video file is encoded from the AVI file. This two-step method is often preferred for encoding high-bandwidth video, because it requires less processing
power and memory, and therefore can help ensure a higher quality capture.
Considering Compression
The digital audio and video streams of samples and pixels are measured as the bit rate of the content, such as 700 kilobits per second (Kbps). High-quality professional video has a bit rate
that far exceeds the capacity of most computers and networks: 270 megabits per second (Mbps). In order to work with digital media on a computer and stream it over a network, it must be compressed.
The program that compresses and decompresses digital media is called a codec.
The Windows Media Audio 9 and Windows Media Video 9 codecs offer a great deal of user flexibility because they are highly scalable, which means you can choose the amount of compression and the
bit rate for a wide variety of scenarios. For example, you can sacrifice some quality and compress to very low bit rates for streaming media at telephone modem speeds, or maintain high quality
and high bit rates to stream or download over a high-bandwidth network or to save to a CD.
When designing your production workstation, you must consider the codec and bit rates you will use. Analog-to-digital conversion and the compression process require a great deal of processing
power for high bit rate content. If you follow the recommendations for the optimal system, your workstation will be able to handle the workload and create high-quality content.
The Key to High Quality
There are two primary methods for encoding video:
Capturing and encoding from a live stream directly to a Windows Media file as a real time process.
Capturing to an uncompressed AVI file, and then encoding a Windows Media file from the AVI file in non-real time.
Real-time encoding is best used for live broadcasting, in which the primary stream is played back as it is being encoded. Real-time encoding requires a faster CPU and more random access memory
(RAM). If the CPU can not keep up with the real time processing, it drops frames of video and motion can appear jerky.
Capturing to an AVI and then encoding in non-real time usually produces higher quality results. This method requires large amounts of hard disk space and faster system components (described
in more detail later). However, the method does not require a fast CPU for encoding; complex encoding modes just take more time. By definition, the non-real time process is slower and requires
more processing steps.
Except where noted, this article describes the system requirements for the non-real time encoding method.
The secret to capturing high-quality, high-bandwidth video is to use a computer system that can handle the bandwidth. A fast CPU with a large amount of RAM is helpful. However, a fast peripheral
component interconnect (PCI) bus, fast hard disk with proper storage capacity, and a network connection that can handle the high bandwidth if you plan to stream to other computers is a requirement.
Capture cards and external hardware should also be capable of producing high-quality images and sound. (For detailed recommendations, see the Hardware Recommendations
section of this article.) There are several items you need to consider when building a high quality workstation:
Source quality. The final product can be no better than the source. Make sure to use a high-quality, high-resolution videotape format, such as Digital Betacam or Mini-DV. With the
proper digital interface, such as the IEEE 1394 (sometimes called FireWire) interface or the professional Serial Digital Interface (SDI), you can skip the analog-to-digital conversion that
reduces quality. If you must convert from analog, make sure your video source is high quality. Use S-video connections, if available, and a professional-quality playback deck. Most of today's
professional or semi-professional videotape decks are capable of producing suitable quality.
If you are capturing from a tuner, demodulator, film scanner, or router, make sure the cables and connections are professional quality and working properly, and the radio frequency
(RF) connections to the tuner or demodulator are properly adjusted and terminated. Not only does poor source quality result in a poor quality product, any noise, glitches, or instability
in the picture can increase the bit rate and size of the final file. The codec cannot distinguish detail in the video image from the detail in video noise, and attempts to reproduce the
video imperfections just as faithfully as the rest of the image.
Fast CPU, PCI bus, and a large hard drive and RAM. A fast CPU enables a computer to keep up with the demand imposed by the continuous stream of bits, while a fast PCI bus moves
those bits easily between the capture device and the processor. A large amount of RAM eases the load on the CPU by enabling bits to be cached as they are converted. A large hard disk with
a fast access time eases the load on the computer by writing data quickly and efficiently. As you capture data, you can use the System Monitor included with the Microsoft Windows® operating
system to view CPU and memory usage. If the CPU percentage often hits 100 percent, there is a very good chance the capture quality will be impaired. If possible, use a computer with dual
PCI buses. Even a very fast single PCI bus may not be able to handle both the bit stream produced by the capture card and the stream going to the hard disk drive.
High-quality capture cards. A capture card is responsible for properly inputting and processing the audio and video signals, and then converting them into a digital stream of bits.
For this reason, the video capture card is the most important link in the chain. A low-quality or outdated capture card can greatly reduce the quality of the video. After the video has been
converted to digital form, the data can be stored, transferred, and copied without affecting quality.
With Windows Media Audio 9 and Windows Media Video 9 codecs, you can achieve near-VHS quality at 250 Kbps and near-DVD quality at 750 Kbps with a variable bit rate (VBR) encoding mode. You can
also improve the quality of high-bandwidth content by taking advantage of the following features:
Deinterlacing. When encoding a video file captured at the full frame size of 640 x 480 pixels, the two interlaced fields contained in a single frame of NTSC video must be converted
to one complete frame that can be displayed on a computer monitor. Computer monitors use a different method of displaying video called progressive scanning, which does not use interlaced
fields. The deinterlacing feature converts interlaced video frames into progressively scanned frames, creating a cleaner, sharper image with fewer motion artifacts at both a full frame size
and at 320 x 240 pixels.
Inverse telecine. To convert film, which plays back at 24 frames per second (fps), to National Television System Committee (NTSC) standard video, which plays at 29.97 fps, a telecine,
such as a film scanner, adds redundant fields to the video. When encoding video of a film, you can use the inverse telecine filter to remove those redundant fields and return the video to
its original frame rate. The final encoded video appears more like the original film; and, with fewer frames, the file size is smaller and bit rate are lower. Inverse Telecine does not apply
to Phase Alternate Line (PAL) video.
50 or 60 frames per second. You can create high-quality video that has very smooth and crisp motion by using the deinterlacing filter to encode from 640 x 480 pixels to 320 x 240
pixels, and then converting the video fields to frames. The 50 fields per second (PAL) or 60 fields per second (NTSC) are converted to 50 or 60 frames per second.
Variable bit rate. Any on-screen movement results in an increase in the bit rate of a video, because new pixels must be generated from frame to frame. To stream over a network with
Windows Media Services, you would use constant bit rate (CBR) encoding to constrain the bit rate to the available bandwidth. With CBR encoding, Windows Media Encoder keeps the bit rate below
a specified level by dynamically reducing the quality of the video, or dropping frames if necessary. Playback is not always as smooth as one would like, but the end user experiences as smooth
a presentation as possible for a given bandwidth.
For video that will not be streamed, you have the option of VBR encoding. VBR-encoded video cannot be streamed, but you can use it for content that is destined to be downloaded
or played back locally or over a fast network. With VBR encoding, the integrity of high motion or rapid changes in the video is maintained by simply allowing the bit rate to vary as needed.
You set the desired quality level, and the bit rate changes to maintain that quality.
Two-pass CBR encoding. When using two-pass CBR encoding, content passes through the encoder twice. The first time, the encoder analyzes the complexity of the content. During the
second pass, the analysis is used to encode the content. Two-pass CBR encoding produces a much cleaner and smoother video than the one-pass CBR method. Two-pass CBR encoding cannot be used
with a broadcast stream.
Two-pass VBR encoding. For two-pass VBR encoding, you specify a desired bit rate instead of a desired quality level. The encoder then adjusts the VBR quality level to create a file
that is the equivalent size of a CBR file of the same specified bit rate. During the first pass, the encoder estimates what the final file size will be, and during the second pass it adjusts
the quality level in order to create a file with the desired file size. Unlike two-pass CBR encoding, the data from the first pass is not used during the second pass.
Nonsquare pixels. A pixel is the smallest unit of a digital image or frame. A resolution of 640 x 480 pixels produces a frame aspect of ratio 4:3 if the pixels are square. By using
nonsquare pixels, which is supported in Windows Media, you can create any number of different aspect ratios and resolutions. For example, you can maintain the original resolution of digital
video (720 x 480), which uses nonsquare pixels.
Multiple bit rate (MBR) audio and video. You can encode one Windows Media file or stream that contains multiple streams encoded at different bit rates. MBR does not itself improve
quality, but it enables you to provide the highest quality content for a wider variety user bandwidths. For example, in one MBR file you can support users connecting to the Internet with
slow telephone modems, while providing high-quality content for users with broadband connections.
By using these features with the Windows Media Audio and Video codecs, you can create high-quality pictures and sound at a fraction of the file size and bit rate of conventional digital media.
The typical computer display has a far higher resolution and faster frame rate than an analog television. This enables you to produce video with Windows Media that is higher quality than standard
television. You can even encode high-bandwidth video that rivals the quality of film in a theater when using a digital cinema projector.
However, to do all this you need a system that can capture and maintain high quality.
The following section shows the recommended minimum and optimal hardware for capturing high-bandwidth content to an AVI file and encoding high-bandwidth Windows Media files from AVI files.
Computer. A fast processor is useful, but not required for capturing video to an AVI file. Keep in mind, though, that the faster the computer is, the shorter the encoding time will
be. A high-bandwidth, two-hour AVI file, for example, might take several hours to encode on a slow computer. More important than raw CPU clock speed is a fast front-side bus that will help
transfer the video data to the hard disk without any conflicts or speed bottlenecks. For the latest information, see Windows Media
Encoder 9 Series System Requirements.
Hard disk. To capture high-bandwidth content, your hard disk must be capable of a sustained access speed of 27 Mbps. Although many hard disks are rated with higher access
speeds, they might not be capable of maintaining that speed throughout the capture of a two hour movie, which can result in dropped frames. It is recommended that you use Ultra160 small
computer system interface (SCSI) drives with a RAID 0 striping. This configuration writes or stripes data across multiple hard disks, but appears as one drive on your desktop.
The SCSI disk array can be expensive. A less expensive alternative uses four integrated device electronics (IDE) drives with an IDE RAID controller board.
The size of the hard disk depends on how much content you plan to store. For example, if you plan to store a two-hour movie as an uncompressed AVI file, you will need 80 to 120
GB of space, depending on the resolution and pixel format that you capture to.
It is recommended that when you plan to build your production station, you shop around for the best solution at the time.
Video capture card. Windows Media Encoder works with most capture devices that have Video for Windows or Windows Driver Model (WDM) drivers. To capture high quality, full-frame,
and full-frame rate video to an AVI file, however, the list of capture cards narrows. Professional cards such as the Targa 3000 from Truevision, the Osprey 500 card from Viewcast, and the
Reality Studio Digital Disk Recorder from DPS are well suited for high-quality video capture.
Some cards were created to capture content to be encoded to Windows Media Format. The Osprey-500 and Winnov Videum II, for example, handle much of the processing needs, so the
computer's CPU and memory are free to compress and encode. The Osprey-500 also provides real-time deinterlacing and hardware-based digital video (DV) decoding for capturing MiniDV video.
The cards can input video from a number of source types, such as MiniDV, SDI video and professional Digital Audio, or analog.
All of the cards that support video capture to a Windows Media file are listed on the Windows Media Hardware Product Vendors page at the Microsoft
Web site. Keep in mind, however, that some of these cards may not produce the quality required to capture high-bandwidth video.
Sound card. Many video cards also capture sound. In the optimal system, video and audio are captured digitally. Some cards are capable of capturing audio digitally or through an
analog connection. Any high-quality sound card is suitable for capturing analog audio, but if you plan to capture audio digitally, make sure the card is capable of synchronizing to a digital
source. Many cards also support multi-channel audio, such as 5.1 and 7.1 audio, which can be encoded directly to a Windows Media file or broadcast stream.
Network card. You should design your workstation to be part of a network. Though files can be transferred to other computers with removable hard disks, it is far more efficient
to connect the computers in a network. You can also connect the workstation through a proxy computer or firewall to the Internet, so that you can transfer files directly to your Web server
or Windows Media server by using File Transfer Protocol (FTP). It is recommended that you use a fast network, such as an Ethernet 100BaseT system. The network cards and hub devices are inexpensive,
and with the 100 Mbps data rate, you can copy large files quickly or play high-bandwidth files across the internal network. During the capture of an uncompressed AVI file, however, you should
disable file sharing, or even disable the network completely, to eliminate the potential of external users accessing your system and hard disk.
The following section describes the basic software needed to capture AVI files and encode Windows Media files and broadcast streams. Your complete workstation may also include programs to edit
video and design audio. You may also decide to install a packaged system that includes a capture card and editing system. If you do so, make sure the system is capable of capturing and editing
full-frame video with no compression. In the optimal system outlined in this article, the highest quality is achieved by capturing data directly to the hard disk with no audio or video compression.
Operating system. Windows Media Encoder 9 Series and the related encoder utilities can run on Microsoft Windows 2000 and Windows XP. It is recommended that you use the NTFS file
system so that you can save files larger than 4 GB, which is the limit with FAT 32 file system. Before selecting an operating system, make sure your video capture card provides the appropriate
driver. For example, some cards may only run on Windows 2000 Professional.
Capture to AVI. You can use any program that enables you to capture full size (720 x 480 pixels for NTSC, 720 x 576 for PAL), full frame rate (29.97 or 25 frames per second), uncompressed
video.
Depending on your capture card, you should capture to a YUY2 or YV12 pixel format. YUY2 produces a larger file, but provides more flexibility; YV12 produces a smaller file, but
cannot be converted to other pixel formats as easily. YV12 may also be referred to as IYUV or I420. For more information, see The
(Almost Definitive) FOURCC Definition List .
The optimal system uses the simple Windows Media 9 Capture utility for capturing uncompressed AVI files with mono, stereo, 5.1, or 7.1 channels of audio, with up to 24 bit resolution
and sampling rates up to 192kHz. You can download the utility from the encoder section of the Windows Media Download Center.
Edit the AVI. There are a number of video editing and sound design programs, such as Adobe Premiere 6 and Sonic Foundry Vegas Video 4.0, with which you can work with full-frame
uncompressed video. If you require no additional editing, you can simply encode the captured AVI file directly. For more information about these products, see the Sonic
Foundry and Adobe Web sites.
Capture and encode to Windows Media. Use Windows Media Encoder 9 Series, which you can download from the Windows Media Download
Center. You can use Windows Media Encoder to capture directly to a Windows Media file or to a live stream for distribution to a Windows Media server for live broadcasting. You can
also encode from an AVI file. You can also encode content with the Windows Media Encoding Script command line utility.
In addition to a properly equipped computer for capturing and encoding data, a complete system can include the following components:
High-quality playback source. A basic system captures video directly from a source such as a videotape recorder, satellite decoder, or film scanner. However, a more complete production
facility may use video routers to share sources among several workstations, editing suites, or control rooms. Whichever system you employ, make sure the video signal that reaches the capture
card is as clean, stable, and noise-free as possible. Noise and instability cannot be removed after a capture, and imperfections will add to the bit rate and size of the final encoded file.
In the optimal system, video is captured directly by using an SDI connection from a Digital Betacam videotape deck.
Mixer with reliable meters and speakers. In a basic system, you can capture audio directly from the source. However, in the optimal system, it is suggested that you use a small
mixer. A mixer is helpful because the software mixer controls that come with most sound cards do not offer the flexibility needed to mix multiple sources and monitor a capture. Note that
if you plan to capture audio digitally, you will need a digital mixer to mix sources.
The following illustration shows the audio and video connections as part of an optimal workstation. Note that, in this system, audio is captured directly from the AES/EBU connections
on the recorder. This design uses a small analog audio mixer to monitor the audio. The digital audio is connected directly to the capture card, and the analog outputs from the Betacam
player and Osprey-500 connect to two stereo channels of the mixer. Stereo speakers are connected to the monitor output of the mixer.
Figure 1. Layout of an optimal production workstation.
High-quality speakers are recommended. However, if you are using the workstation for straight captures, a set of small computer speakers should be adequate. If you plan to edit
and do any sound design with the workstation, you should invest in professional monitor speakers that faithfully reproduce the full frequency spectrum without coloring or distorting the
sound.
Unity Gain
One of the most common mistakes made when capturing audio and video is not maintaining unity gain throughout the system. Unity gain means that at every point in a production system where signal
adjustments can be made, the integrity of the original audio and video is maintained by the correct calibration of the audio and video levels. Video that was captured with audio and video levels
correctly calibrated for unity gain looks and sounds identical to the original source. However, too often the correct process is not followed when setting levels. The result can be distorted
or noisy audio, mismatched colors, or clipped video.
The complete procedure for achieving unity gain depends on your source and capture hardware, and is outside the scope of this article. However, you can use the following process as a guide:
Use color bars and a test tone. If available, use color bars and the test tone recorded on your tape or other source, such as a satellite. Note that color bars and tone from one
source will be different from another source. In other words, you cannot rely on color bars from one satellite feed to match those of another feed or a tape.
Figure 2. SMPTE color bars display used to calibrate video.
Adjust the source. Play the color bars and tone. If you have a waveform monitor and vectorscope, connect them to the analog outputs of the source and adjust the following settings
for unity gain:
Video level
Setup
Chrominance
Phase (hue)
Also, adjust audio to 0 volume units (VU) if the control is available. If you are capturing digitally, you do not need to make any adjustments to the source. If you do not have
a waveform or vectorscope, you can switch the settings to their default unity gain positions. Without calibration equipment, you must trust that the gains have been properly set to unity.
Adjust the capture card. Play the color bars and tone. Use a software waveform monitor/vectorscope, if available, to adjust video settings on the capture card to unity. Some high-end
editing systems include these tools. After you set the card to unity, you may only need to make occasional minor adjustments to account for drift introduced by temperature and the aging
of hardware components. If you do not have a software waveform/vectorscope, set the controls to their default or unity positions. The alternative is to make adjustments by eye with a well-calibrated
video monitor. However, this is not the recommended approach if better alternatives are available.
Figure 3. Example of a software vectorscope displaying correctly adjusted color bars.
To adjust the audio to unity, use a program that provides an accurate VU meter. While playing back a 0 VU tone, adjust the line-in mixer control of the sound card to display
the proper reading on the VU meter.
Figure 4. A software VU meter showing a tone calibrated to 0 VU.
A test tone will often be recorded at -12 or -18 decibels (db). This allows roomheadroomto help avoid the digital distortion that occurs if audio levels
exceed 0 VU.
Figure 5. A software VU meter showing a tone calibrated to -12 db.
High-end sound cards often come with reliable meters, as well. If you are capturing audio digitally, no adjustment is necessary. If the mixer software that comes with your sound
card does not provide a meter, you cannot properly adjust the audio for unity.
Adjust audio on the mixer. If you use a mixer with a built-in VU meter, you can adjust the input to read 0 VU on the meter after you have established unity gain in the sound card.
Mark the fader positions and use that meter as a quick reference when capturing. To adjust the speaker level, use a monitor or phones output if available. Keep in mind that if you use the
channel fader to adjust the listening level, the meter will no longer give you an accurate measure of the record level.
Avoid Audio Resampling. You should capture audio at the same sampling rate with which it was recorded on the source media. If you capture at a different rate, capture and encoding
programs will resample the audio data, which degrades quality. Use the source sample rate when configuring the AVI capture program, and when selecting an encoding profile. Audio that is
recorded in the Digital Betacam format has a sampling rate of 48 kHz; the Mini-DV format can be either 32 kHz or 48 kHz; the audio CD rate is 44.1 kHz; and DVD content can be either 48 kHz
or 96 kHz. If you do need to resample the audio, try to avoid non-integer re-sampling. For example, 96 kHz to 48 kHz will produce cleaner results than from 96 kHz to 44.1 kHz. Also,
if you do need to resample the audio, you will get better results if you capture at the original sample rate (maintaining a clean source file on your system) and then resample the audio
with a high quality audio tool, such as Sound Forge, Cool Edit, or DigiDesign Pro Tools.
With the optimal system properly set up, you can begin capturing and encoding content. Before you start, make sure only those programs and processes necessary for capturing data are open. For
example, close virus detection programs, screen savers, personal Web servers, e-mail programs, and disconnect any mapped network drives. If you are on a network, make sure other computers are
not accessing services or files on the workstation.
Begin by opening the Windows Media Capture 9 Series utility or any suitable capture program. Capture the video full-frame and uncompressed at 29.97 frames per second (NTSC) to an AVI file. Capture
the audio at a sampling rate of 44.1 kHz or 48 kHz, bit depth of 16, and stereo or mono, depending on the source.
After the AVI has been saved, open Windows Media Encoder 9 Series and encode a Windows Media Video file from the AVI. Alternatively, you can capture and encode in two steps using the encoder
only. The encoder provides a method for capturing temporarily to the hard disk if you are using a two-pass encoding method or anytime you want to encode and compress in a second step. For more
information about encoding high-bandwidth files, see Creating High-Quality Content with Microsoft Windows Media Encoder and
Windows Media Encoder Help.
When the file is finished, test playback in Microsoft Windows Media Player 9 Series, and then copy the file to your Web server or Windows Media server for distribution. Remember, if you want
to stream the file from a Windows Media server, you should encode the file with a constant bit rate. If a client computer does not have Windows Media Audio and Video 9 codecs, they will be installed
automatically when Windows Media Player attempts to play the content. Only the standard Windows Media Audio 9 codec is supported in Windows Media Player 6.4.