This article discusses hardware implementation details for implementing high-quality video and audio decoding under the broadcast architecture for Microsoft Windows operating systems.
| Introduction | |
| Video/Graphics Adapter Synchronization | |
| Audio/Video Decode Synchronization | |
| Head-end/Decoder Synchronization | |
| Synchronization Requirements | |
| Call to Action: Watch a Movie Using Your Adapter |
Removing video artifacts from PC-based video solutions is crucial, because television viewers tend to watch for extended periods of time (30 hours per week on average). For PC-based "digital television" to be acceptable, it must provide a better viewing experience than a normal television. This is especially true given the cost differential of a PC versus a television. The bottom-line question in considering PC video remains, "Is the picture quality better than television?" Otherwise, why spend $2,500 on a PC-TV system for a picture that is worse than a $250 television set? Video artifact problems must be solved in order for the choice to be PC video.
For analog television sources that originate from terrestrial television stations, cable television systems, VCRs, laser discs, game consoles, and so on, the VGA frame rate must still be synchronized to the incoming video frame rate. Essentially, the VGA must run at the video frame rate (NTSC is 59.94 Hz, most PAL flavors are 50Hz), or an exact multiple of the video frame rate (that is, 119.88 Hz or 100 Hz).
Because many monitors exhibit problems running at 50 Hz, the best choice for PAL countries is 100-Hz monitor support. The reason that the VGA rate must be adjusted to keep pace with the incoming video rate is to avoid accumulated frame rate decoding errors that show up as dropped or duplicated video frames. This is explained in more detail in "Video/Graphics Adapter Synchronization" later in this article.
For MPEG data sources that originate from digital broadcast satellite or other broadcast MPEG media, the video and audio decoding must be synchronized. Viewable MPEG decoding requires three separate synchronization methods to properly decode video and audio. The three synchron-ization components deal with video/VGA synchronization, audio/video decoder synchronization, and the head-end/decoder synchronization for digital television. All of these methods must be perfected to properly display video without duplicating or dropping frames.
The most important synchronization issue is to lock the VGA display rate to the video field rate. In order to avoid dropping or duplicating decoded video fields, the VGA frame rate must be genlocked to the incoming video stream. (Genlocking is a method of synchronizing the output rate of one video display device to another in such a way that they both run in tandem.)
This is key to avoiding secondary dropped or duplicated artifacts during decoding. Any synchronization solution that fails to handle this issue will cause artifacts that even analog TV viewers will find unacceptable. Even the slightest discrepancy between the decoder's field rate and the VGA frame rate will eventually result in dropped or skipped fields. Mismatched video rates are unacceptable and will not meet user expectations or Microsoft Windows Logo Program requirements.
Arbitrarily mismatched VGA frame rates, such as 72 Hz, will exhibit noticeable "beat frequency" artifacts that show up as repeating patterns at the difference between the frame rates. The best test is to look at a smooth-scrolling video image such as the CNN stock ticker. Just compare the PC solution against any standard television. The VGA video image should exhibit no jumpiness or "jitter," and should scroll very smoothly across the screen even for very long periods of time (several hours).
Microsoft recommends that the incoming digital video stream be used to perform a digital genlock synchronization of the VGA frame rate to the video. There are many ways to achieve such synchronization, including dynamically adjusting the VGA Phase Locked Loop (PLL) values or adjusting the dot clock of the SVGA using a voltage controlled decoder crystal (VCXO). (PLL is clock circuit that maintains a set frequency and that can be adjusted to output fractions of the primary clock rate.) However, the best methods use the horizontal/vertical synchronization data in the incoming digital stream in order to synchronize the VGA.
It is also necessary to double or triple buffer the incoming video in overlay frame buffers to avoid tearing, but this will not by itself prevent frame skipping or duplication. Therefore, while a particular system design must have enough memory to allow for double or triple buffering, the system also needs to synchronize the VGA frame rate to the decoded video frame rate. This synchronization is also referred to as "frame lock" or "digital genlock" synchronization.
Up to this point we have been discussing incoming video and the VGA output signal. There is another case that is worth considering. If the VGA output is hooked into a NTSC/PAL/SECAM video encoder for display on large screen televisions, you will have similar problems if you don't match the incoming television video rate. In both cases (VGA monitor and regular television), the output rate should be dependent on the incoming video signal rate being received.
Potential solutions for video/graphics adapter synchronization include:
| • | Modify the PLL circuitry in the VGA chip set to use the video vertical synchronization (Vsync) to stay in synch. |
| • | Adjust the number of scan lines or horizontal pixels dynamically for speed up and slow down. |
| • | Under software control, adjust the VGA PLL constant dynamically to increase or slow down the video. |
The audio and video streams must be decoded and presented in tandem to ensure "lip synch." The decoders can be independent processors, or they can be linked as one unit. In either case, some form of regular A/V synchronization must be enacted to keep them in lock step. If you fail to keep the audio and video decoding synchronized, the viewer will notice that audible effects happen before or after they should, which is not acceptable.
The audio decoding is considered more significant, because any discontinuities in the audio are readily apparent, whereas video is less sensitive. Therefore, the audio STC must be used regularly to reset the video STC to keep them in lock step.
The video decoder can be adjusted more easily than the audio decoder, because a video field is emitted only once every 60th of a second (16.6 ms). The video STC can be adjusted between each video field--provided that the adjustment is not significant.
The key to adjusting the video STC is to ensure that audio and video do not get out of synchro-niza-tion more than one field time (16.6 ms) at any one time. The adjustment must never result in a skipped or dropped video field. It is recommended that the audio and video STC values be synchronized on a regular basis of at least once per second.
It is imperative that the decoding process neither drops nor duplicates audio samples or video fields. This must be true even for extended decoding periods (many hours to days). It is recommended that MPEG test suites be used to verify this. One potential test is to encode continuous motion and frame numbers into an MPEG test set that can be decoded and output as NTSC. This NTSC stream can then be recorded and played back on a VCR in slow motion to verify the accuracy of the decoding process.
Potential solutions for audio/video decode synchronization include:
| • | Use an integrated audio/video decoder chip that handles synchronization internally. |
| • | Designate the audio decoder as the master clock in the DirectShow graph; synchronize others. |
| • | Copy the current STC from the audio decoder to the video decoder periodically. |
Because the MPEG data streams are broadcast from the data source (such as the digital broadcast satellite head-end) without any form of flow control, the decoder must be adjusted to avoid accumulated clock error. Typically less than a second's worth of MPEG buffers will be in the system. This tight decoding tolerance requires that the decoder crystal be continually adjusted to compensate for any rate inaccuracies.
In the case of video CD, disk files, DVD, and other mass storage devices, the rate of decoding is driven solely by the decoder, and MPEG data is "pulled" from the device on an as-needed basis. With digital broadcast television, DVD, and other broadcast MPEG data sources, the rate of decoding must match the rate of encoding at the head end. Data is "pushed" to all of the clients without any flow control at some arbitrary rate.
In the push-model case, MPEG data also contains periodic system clock reference time stamps (SCRs) that provide clues about whether to increase or decrease the rate at which the system time clock (STC) advances. The rate of the STC determines the rate of decoding. If the decoder runs out of data or fills all of its input buffers, there will be visible decoding discontinuities as the picture decoding halts or incoming data is discarded. Both of these cases are catastrophic.
Most decoder crystals in mass production have a rated accuracy of more or less 100 parts per million (ppm), which on a 27-MHz crystal equals more or less 2700 parts per second. The broadcast encoders typically use very accurate crystals (<5 ppm). To maintain the correct rate of MPEG decoding, the decoder crystal must be adjusted to keep pace with the MPEG encoder's crystal.
The VCXO should therefore be adjustable in very fine steps over the entire more or less 2700 parts (assuming maximum 100-ppm crystals). This range might need to be significantly larger if there are out-of-specification crystals in the production lots. Regardless of the overall range of the adjustment, the rate of adjustment should be no more than 0.5 Hz per second. The precision of the adjustments should be at most 0.05 of a Hz resolution.
The same decoder crystal is also used to drive the NTSC encoder. Because the NTSC signal contains a color-burst reference signal, any wild VCXO adjustments made during the NTSC encoding process can cause noticeable color distortions ("rainbows"). For this reason, the VCXO changes must be very fine (<0.05 Hz) and the number of adjustments should not occur too often (around 10 per second).
Potential solutions include capturing and time-stamping the incoming SCR values in hardware on the receiver card and then later time-stamping them again (adjusting for elapsed time on the board) with a high-resolution PC system clock [KeQueryPerformanceCounter()] when the device driver reads them from the receiver card.
Windows Logo Program requirements are available at http://www.microsoft.com/whdc/winlogo/downloads.mspx.
Hardware manufacturers must study these issues closely and work to resolve video quality problems they currently have. The solutions are achievable. Specifically, accept this challenge:
Watch an entire movie or television show using your current adapters.
Does the video skip? Is the motion jumpy or irregular? Do you get a headache after a few minutes?
If this experience is not 100% satisfactory, determine why and explore the innovations that will make for a better experience.