Bill Birney and David Workman
Microsoft Corporation
May 2003
Introduction
The shapes of frames have been standardized by the film and video industries for compatibility with all of the hardware that is used to produce and play back the media. If a video producer wants to create a custom shape, the video producer has to do it within the boundaries that are set by the standards. For example, if a producer wants a video frame to be the same shape as a widescreen film, the producer can use the letterbox technique to add black bars above and below the frame. However, even though the letterbox technique creates a widescreen frame for the image, the actual video frame is not changed. When the video is played on a television, the image uses less of the available screen area and resolution. In other words, to get the framing that he or she wants, a producer must give up image detail and sharpness, in effect losing 14 percent of the available image. If the video is digitized, the black bars require as much storage and bit rate as any other part of the video.
With Windows Media, you do not have the compatibility issues that you have with video. You can create any number
Feedback
E-mail us with your comments and feedback about this article.
Abstract
Learn how to work with pixel and frame shapes to optimize quality when you convert digital video from a camcorder or from other video sources that use nonsquare pixels. This document describes aspect ratios and resolutions, the basics of how video frames are constructed, and how to set up Windows Media Encoder 9 Series to use nonsquare pixels. You can also download a set of high-resolution test files with which you can encode sample content to help you understand how to work with these properties.
of rectangular frame shapes and sizes without adding black bars because you can change the actual shape and size of the frame. For example, you could encode a long, narrow video frame and embed it in a banner, as illustrated in Figure 1.
Figure 1. Video frame embedded in a banner
This document is about how to determine and modify the shape and size of a video frame by using the functions and features that are available in Windows Media Encoder 9 Series. The document also includes a link to download sample AVI files with which you can try different settings and techniques.
The encoder provides three ways to set the shape and size of a frame. You can change resolution or frame shape by modifying the number and arrangement of pixels (the tiny blocks of color that make up a digital image), by cropping or cutting off rows of pixels from one or more edges of a frame, or by modifying the aspect ratio or rectangular shape of the pixels.
By working with the shapes of frames and pixels, you can:
Capture or convert digital video with the best resolution. The CCIR-601 (Consultative Committee for International Radio) standard uses pixels of different shapes (nonsquare pixels) to standardize conversion from the various international analog video standards to digital video. If you capture video digitally by using either IEEE 1394 (Apple FireWire) or SDI (Serial Digital Interface) ports, or if you convert files that contain digital video, it helps to know how nonsquare pixels work so that you can optimize the quality of your video.
Optimize bit rate and file size. You can use a technique that has been used in the film industry for many years to both reduce the bit rate and file size of digital video, and create a wide-screen frame. This method is called anamorphic.
Create custom frame shapes. Understanding how to use cropping, resizing, and pixel shapes in Windows Media Encoder 9 Series can help you create custom frame shapes that better integrate with a visual design.
A frame is the container that holds a video image, and it has two properties: a shape and a size. Almost all video frames have a rectangular shape, the proportions of which are determined by their aspect ratio. The standard video aspect ratio is 4:3, meaning that the standard video frame has a width of 4 units and a height of 3 units (also expressed as 1.33:1). Other common aspect ratios are 16:9 (1.77:1) for high definition television, 1.85:1 for standard theatrical films, and up to 2.35:1 for widescreen theatrical films.
The size of a frame that is projected on a theater screen or scanned on a television is measured in either feet or inches. The size of a video frame on a computer monitor is often measured in pixels because the actual image size on a monitor varies, according to the type of monitor and how a user has configured the display. For example, a video frame of 640 pixels by 480 pixels might appear 6 inches high with a video display setting of 800 x 600, and 3 inches high with a video display setting of 1280 x 1024.
Resolution is the measure of image detail that is available in a frame. The more detail that a frame contains, the larger you can make the size without seeing visual artifacts. For example, a 35 millimeter film frame is capable of much higher resolution than a standard-definition broadcast video frame. You can easily see the difference between a 35 millimeter film frame and a standard definition broadcast video frame when both images are projected side-by-side on a large theater screen. With analog images (such as a film or an analog video frame), the amount of available resolution is determined by mechanical and electronic limitations (such as the quality of the lenses, film stock, and processing hardware, as well as by the standards that are used). With digital images, resolution is determined by the number of pixels. Therefore, different methods are used to measure the resolution of analog and digital video.
The construction of a video frame is based on the analog television standards that are used throughout the world. The three world standards include:
The National Television Systems Committee (NTSC) standard that is used primarily in North America, South America, and Japan.
The Phase Alternation Line (PAL) standard that is used primarily in Europe and Asia.
The Sequential Colour Avec Memoire (SECAM) standard that is used primarily in France and Eastern Europe.
The basic differences between the standards include the frame rate (number of frames per second), the resolutions, and the methods that are used to convey color information in the final, composite analog video signal.
Although there are differences between the three television standards, they do have many basic properties in common. Images are scanned from the top-left corner to the bottom-right corner in a series of horizontal lines. For the signal to be reconstructed by a television, synchronization and color information signals, or pulses, are added to the scanned lines to create a composite video signal. In addition, each frame consists of two fields that are each scanned in alternate lines. The first field that is scanned starts with line number one and continues with the odd-numbered lines to the bottom of the frame. Then, the even-numbered lines are scanned to create the second field, and the two fields are displayed together as one frame. Because the fields are woven together, this method of scanning is called interlace.
Both the 4:3 frame aspect ratio and the methods that are used to measure resolution are common to all standards. Resolution is measured by the number of lines that are scanned, and it is expressed in both vertical and horizontal resolutions. Because a frame contains a standard number of horizontal lines, vertical resolution can be no larger than 525 lines for NTSC, and 625 lines for both PAL and SECAM.
Horizontal resolution is also measured by the number of lines, even though the frame is not scanned vertically. This measurement is gauged visually by using a tool, such as a resolution chart, which is displayed in Figure 2.
Figure 2. Chart used to measure resolution
To measure the horizontal resolution of a video camera or system, the camera focuses on a full-frame shot of the chart. Then, the markings on the chart are read on a video monitor. The area immediately above the center square is enlarged in Figure 3.
Figure 3. Enlarged section of resolution chart showing resolution lines
To read the horizontal resolution, note the point where the converging lines appear to blend together. The number that is next to that point is the maximum horizontal resolution. The signal from a high-quality video camcorder can have a resolution of over 600 lines, compared to the signal from a consumer VCR, which can have approximately 240 lines.
If you were to zoom into one line of video and view about 1/10th of that line, Figure 4 illustrates what you might see, assuming that the image in the frame is irregular vertical stripes.
Figure 4. Sections of scan lines showing different resolutions
The top line represents what you would see if the horizontal resolution was perfect. In practice, however, the best resolution you will be able to produce is high resolution, as in the second line. You may often have to use low resolution in which a substantial amount of detail is lost. The fourth line represents the video line after the low-resolution image is digitized into pixels.
To convert analog video to a digital format, the scan lines are sampled at regular intervals and a series of pixels is created. You can see an example of this in the fourth line of Figure 4. Each pixel is a very small rectangle of color that represents the analog image at that point in time. The pixel is the smallest unit of a digital image; the more pixels that a frame contains, the more detail is in that frame, and the higher is the resolution. When you begin shooting with a digital camera or capturing from an analog source, you can often choose the resolution that you want, which determines how many samples are taken for each analog frame.
The quality of the digitized video is based on the number of samples that are taken (pixels), the amount of data that is used to record each sample, and the quality of the process that is used to make the samples (the analog-to-digital converter in the capture card, for example). With analog video, resolution is degraded as the video signal is amplified, processed, stored, or transported over long distances. With digital video, resolution does not change unless you explicitly change or lose the data that describes the pixels.
Digital video must follow many of the standards that are set for analog video because digital video is often converted from or to an analog signal. Consequently, a pixel is the height of 1 line of analog video. When you convert digital video from NTSC video, which has 525 lines, 39 lines are removed because they contain synchronization pulses, which are not necessary for digital video. During capture, an additional 6 lines are also removed to make the number of digital lines, or horizontal rows of pixels, an even 480 lines. If you use the analog standard aspect ratio of 4:3, you arrive at a standard definition resolution of 640 pixels by 480 pixels. Other common resolutions are 320 x 240 and 160 x 120. For PAL, the resolution is 768 pixels by 576 pixels. Digital video is not restricted by the same types of standards as analog video. Windows Media Encoder can encode files and streams with any number of different resolutions and aspect ratios. For example, you could easily encode video in the high definition video aspect ratio of 16:9.
There are a number of variables to the shape of a frame: the frame aspect ratio (4:3, 16:9, 2:1) and the resolution (640 x 480). If you change the resolution, the frame aspect ratio can change and visa versa. There is also a third variable, pixel aspect ratio. The resolution of 640 x 480 for a frame aspect ratio of 4:3 is based on a pixel that is square. In digital video, however, anything is possible. Unlike analog video, where you must adhere to rigid standards, the encoder and Windows Media Player can easily render any number of frame and pixel aspect ratios.
Nonsquare pixels are still the height of one analog scan line, but the width of a pixel can be any size. Source content can have a very high horizontal resolution and use a large number of narrow pixels, for example, or a reduced horizontal resolution, by using a wider pixel and fewer pixels per line. You could just as easily use a nonsquare pixel with an aspect ratio of 2:1 as a square pixel with a 1:1 aspect ratio. This is illustrated in Figure 5.
Figure 5. A square and a nonsquare pixel
A frame is determined by its shape and size. Frame aspect ratio and pixel aspect ratio determine the shape, and resolution determines the size. Typically, you configure these settings in the encoder to match the settings in the source content, rather than change the source content to fit a particular pixel aspect ratio or resolution.
The following two sections describe how to use aspect ratios, resolution, and pixels when you convert digital video to create custom shapes for design purposes or when you use the anamorphic technique to reduce bit rate and file size.
With the introduction of the CCIR-601 standard for digital video, a horizontal resolution of 720 pixels was established with vertical resolutions of 576 pixels for PAL video and 480 pixels for NTSC video. For both standards to display with the correct 4:3 frame aspect ratio, nonsquare pixels are used. To achieve the correct aspect ratio for NTSC video (720 x 480), the pixel aspect ratio is 10:11; for PAL (720 x 576), the pixel aspect ration is 12:11, as in Figure 6.
Figure 6. Nonsquare pixels used in digital video
To properly capture, convert, or playback digital video on a computer, therefore, the program you use must be capable of handling nonsquare pixels. For example, when you capture from a digital camcorder through the IEEE 1394 (Apple FireWire) interface, a nonsquare pixel-capable program will typically give you the option to capture the video in two ways:
Convert to square pixels. Typically, the program, by default, automatically converts the video to square pixels if you capture the video without making changes to the shape of the frame. The conversion maintains the 4:3 aspect ratio by resampling the horizontal pixels. For example, if you choose to capture with a horizontal resolution of 640, the program resamples the original 720 pixels per line to arrive at 640 pixels per line. The main disadvantage of using this approach is that you lose a good amount of your horizontal resolution and detail. Also, if the algorithm that you use to resample the pixels is of low quality, the final video can contain other artifacts.
Capture at full resolution. If you use a program, such as Microsoft VidCap32, to capture digitally to an AVI file at the full 720 x 480 resolution, the program does not perform any conversion or resample the horizontal pixels. The final AVI file contains all of the data to render a full 720 x 480 frame with nonsquare pixels. The only disadvantage is that AVI files do not contain pixel aspect ratio information, so many players, including Windows Media Player, render all AVI files with square pixels. Therefore, the frame appears stretched or squeezed horizontally when the file is played back.
If you plan to play your captured video as an AVI file, you must capture at 640 x 480 and enable the capture program to convert the pixels. A better solution, however, is to encode your video by using Windows Media Encoder 9 Series. You can capture digital video directly with the encoder or encode from an AVI file that contains video that was captured with nonsquare pixels. By using the encoder and playing back the final stream or file with Windows Media Player, you can encode and play back the final stream or file with nonsquare pixels. You can capture at full horizontal resolution without conversion. You can also encode and playback any number of frame sizes, shapes, and resolutions, including the common 16:9 aspect ratio that is used in high definition video.
Figure 7 shows how the size and shape functions work together in Windows Media Encoder, starting with the source content at the top and ending with an output image.
Figure 7. Size and shape functions in Windows Media Encoder 9 Series
The encoder enables you to crop and resize the frame, as well as specify the pixel aspect ratio. Cropping and resizing changes both the resolution and aspect ratio of the frame, but it does not affect the pixel aspect ratio. By using the cropping function, you can remove rows of pixels from one or more edges of the video frame. In Figure 7, 33 lines of pixels are cropped from the top and bottom to give the frame a 1.85:1 aspect ratio, which is the same ratio that is used in most feature films.
After you determine how you want to crop the frame, you can resize the image. In Figure 7, the frame was resized to the original aspect ratio and resolution, which results in an image that is stretched vertically. You can also specify the appropriate pixel aspect ratio. In Figure 7, a pixel aspect ratio that creates a long pixel was selected. This has the effect of stretching the image horizontally to match the aperture of the original image.
In most cases, you will not use every size and shape function at the same time. For example, you may only want to crop a frame. To do this, you would choose not to resize the image. If you do not resize the image, the aspect ratio remains the same after you either enter crop values or modify the pixel aspect ratio.
To capture or convert CCIR-601 digital video to a Windows Media file or stream, use the functions and process that are described in Figure 8.
Figure 8. The process for converting digital video
The procedure does not include cropping, although you could use it to, for example, trim off a border. Resizing is also not used so that original resolution remains the same. The only property that you should not keep from the source is the pixel aspect ratio, which is 10:11 for digital video.
The following procedure describes how to set up Windows Media Encoder 9 Series to encode a sample AVI file that has the same properties as digital video from a CCIR-601 source, such as a DV camcorder. The file is 1 of 29 sample files that you can download from the Microsoft Download Center Web site. The samples include test content that was captured in a variety of pixel and frame aspect ratios, as well as resolutions. Before you follow the procedure, download the files or use an AVI file that has a resolution of 720 x 480 and a 4:3 frame aspect ratio.
To encode a source file that contains CCIR-601 video:
Open the encoder and start a new session with the Capture a File Wizard.
Source from the NTSC4X3_720X480_PAR10X11.avi sample file, and then type a destination path and file name.
Click Windows Media server (streaming) for the content distribution method. This method is not important for this example, but the Windows Media server (streaming) option offers the most choices for bit rates.
In the Encoding Options box, click a bit rate, and then click Finish. For this example, you can click a bit rate of 300 Kbps.
There are a number of settings that you cannot make in the wizard. To configure these settings:
Open Session Properties, click the Compression tab, and then click Edit. Custom Encoding Settings.
On the General tab, select the Allow nonsquare pixels check box.
On the bit rate tab, in the Video size box, select the Same as video input check box, and then click OK.
With this setting, the encoder does not resize the frame or change the resolution from that of the source, which is the NTSC CCIR-601 resolution of 720 x 480. If you were to encode the stream at this point, the frame would be wider than the 4:3 ratio and the picture would appear stretched horizontally, as in the Resize: none frame in Figure 8.
On the Video Size tab, in the Pixel aspect ratio box, click DV NTSC 4:3 (10:11), and then click Apply.
Even though there are 720 pixels running horizontally, each pixel is slightly narrower than a square pixel so that the image is displayed in the correct 4:3 ratio.
Encode the file.
As you encode the file, the Video panel displays the frames with square pixels. After the encoder finishes, open the file in Windows Media Player 9 Series. The video is rendered with nonsquare pixels and the correct frame aspect ratio.
With a horizontal resolution of 720, the image has more detail than it would with square pixels. Consequently, the output file is slightly larger and the video plays back at a higher bit rate. If this is a problem, you can capture by converting to square pixels and a lower resolution. To do this, resize the frame to a 4:3 ratio, such as 640 x 480, and do not select the Allow nonsquare pixels check box.
The anamorphic technique was first used in film to fit widescreen movies into a standard film frame. To make the system work, special lenses are used to squeeze the image into the frame, and then unsqueeze the image in the theater. You can use the technique with Windows Media files and streams by changing settings in the encoder to use nonsquare pixels. By using nonsquare pixels and resolution settings, you can create custom frame aspect ratios. There are two reasons why you might want to do this. First, you can use a custom frame shape and size as a creative tool to complement a design. For example, you could encode a video to play back with a widescreen aspect ratio.
Figure 9 shows a normal image with an aspect ratio of 16:9, and the image squeezed to half its size.
Figure 9. Anamorphic video squeezed and the normal aspect ratio
Secondly, you can use anamorphic video to reduce bit rate and file size. The image on the left contains exactly half the data of the typical image. By cutting horizontal resolution in half and then using a pixel aspect ratio of 2:1, you greatly reduce the bit rate of the video stream. As an alternative, you can encode with the same bit rate that was used with the normal image, and then use the extra bit rate to increase the image quality and frame rate. Reducing image data does cause a reduction in horizontal resolution. However, you can use this technique in moderation to create a balance between the quality that you want and the bit rate that you must have.
A frame aspect ratio of 16:9 is the standard that is used in high definition television. With a horizontal resolution of 720 pixels, you would need a vertical resolution of 405 pixels to achieve a 16:9 ratio with square pixels. However, you can use a resolution of 720 x 480 and a long pixel shape, often without noticeable video artifacts, when you use nonsquare pixels and the anamorphic technique. Figure 10 illustrates the encoding process.
Figure 10. Using nonsquare pixels for anamorphic video
To encode a file by using the anamorphic technique, use the procedure in the previous section. For the source, use the NTSC16X9_720X480_PAR40X33.avi sample file, and then click DV NTSC 16:9 (40:33) in step 8.
The video that is in the sample AVI file has a frame aspect ratio of 16:9 with a resolution of 720 x 480. When you play back the AVI file in Windows Media Player, it is rendered with square pixels and appears squeezed. After you encode the AVI file with a pixel aspect ratio of 40:33, the video is stretched back to its original 16:9 shape. Because the amount of squeezing and unsqueezing is moderate, you are not as likely to notice image artifacts.
Try encoding the other sample files. The title of each file includes the analog video standard, frame aspect ratio of the video, resolution, and pixel aspect ratio. Use the procedure in the previous section to change the pixel aspect ratio to create final files that play correctly in Windows Media Player 9 Series.