GDI+: Next-Generation Graphics Device Interface

Draft March 10, 1999

Updated: December 4, 2001

This article describes some of the hardware implications for GDI+, the next-generation graphics device interface for Microsoft Windows XP and future versions of the Windows operating system. It provides an overview of GDI+ from a hardware perspective, and describes the GDI+ driver model, required and recommended acceleration support for display hardware, implications for supporting windows layering, support for CPU memory controller and display hardware integration, and implications to printer hardware for the GDI+ continuous tone model.

On This Page
IntroductionIntroduction
GDI+ and Windows LayeringGDI+ and Windows Layering
Gamma on the DesktopGamma on the Desktop
Driver ModelDriver Model
Displaced Subpixel RenderingDisplaced Subpixel Rendering
3-D User Interfaces3-D User Interfaces
Next-Generation Shared Memory ArchitecturesNext-Generation Shared Memory Architectures
Continuous Tone Model for PrintersContinuous Tone Model for Printers
Conclusion
Conclusion
ReferencesReferences
*

Introduction

GDI+, formerly known as GDI2k, will be the next-generation GDI from Microsoft. GDI+ will create the infrastructure for new desktop user interface innovation, will permit easy integration of 2-D and 3-D, will bring digital imaging to the desktop, and will raise the bar on desktop graphics and performance. GDI+ will offer enhanced graphics capabilities such as alpha blending, antialiasing, texturing, and advanced typography and imaging. GDI+ will emphasize hardware acceleration with excellent visual quality.

GDI+ VisionThe primary focus of GDI+ is the graphical desktop experience. "Desktop" is defined as the standard graphics mode in which users run their desktop applications, as distinguished from "full-screen exclusive mode." It is fundamental to note that the vast majority of users spend most of their computing time at the desktop resolution running 2-D desktop applications. As such, the primary goal of GDI+ is to fundamentally improve that graphical desktop experience.

The following are the major components of the GDI+ vision:

The desktop should employ a composited painting model. It must be possible to compose the desktop by compositing together separate graphical windows using alpha blending and antialiasing. Individual windows should be able to seamlessly animate with no tearing, even in this composited model.

It must be possible to seamlessly mix 2-D, 3-D, video, and animation with high performance.

All rendering should be high quality, with pervasive alpha blending and antialiasing, gamma correct drawing and color correct images, high quality filtering, and high fidelity printing.

Because the desktop is a shared, heterogeneous environment composed of different processes and components drawing to different parts of the screen, solutions that work for exclusive-mode full-screen applications do not work for the desktop. This article addresses the implications of this vision.

GDI+ Hardware Acceleration PhilosophyThe hardware acceleration philosophy of GDI+ can be summed up as follows: The performance of current 2-D drawing is effectively maximized, but there is much improvement that can be made in terms of quality. Visual quality is extremely important to the user experience, and quality will again demand good 2-D performance from the hardware.

Top of pageTop of page

GDI+ and Windows Layering

"Windows layering" is an idea for fundamentally advancing the desktop windowing model.

Why are windows on the desktop always opaque, rectangular shapes? Why does any animation always have to occur within the bounds of some rectangular window? Why can't the animation happen directly on the desktop, on top of other windows? Why can't windows have transparency? Why do most windows "tear" when there is any animation?

Windows layering addresses these questions. It is based upon one simple, fundamental idea: what if top-level windows on the desktop were separate, composited sprites, using per-pixel alpha to define the window's shape and opacity?

The advantages this brings are immense. It allows the following, for the first time:

Windows with arbitrary, animating shapes

Windows with translucency

Instantaneous window manipulation, with no redraws needed

Windowed animation with no tearing

To achieve this idea, GDI+ will support hardware-accelerated compositing on either the front-end or the back-end of the graphics accelerator. In either case, layers are described to the hardware using 32bpp pre-multiplied alpha ARGB surfaces.

Front-end Compositing
With the front-end solution, GDI+ will use the Blt and alpha-blending hardware of the graphics accelerator to double- or triple-buffer the screen and compose the desktop on a frame-by-frame basis. This will be dynamic; for example, if no layered windows are currently animating there is obviously no need to double buffer the entire desktop.

To eliminate tearing, GDI+ will leverage full-screen page flipping and scan-line triggered screen-to-screen Blts. Scan-line triggered screen-to-screen Blts are the preferred solution, and must happen asynchronously in the hardware--it is unacceptable to require either the CPU or the graphics accelerator to block until the event is initiated. It must also be possible to program the scan-line triggered Blts at high frequencies, such as at the vertical blank interrupt. It must be possible to have at least four rectangles with different triggers pending, but an arbitrary sized queue is preferred. If triggers overlap in time such that an event is triggered before the operation for the previous event is completed, the subsequent event must not be dropped but should be executed when the first is complete. There must be a mechanism to allow the software to determine when a Blt has completed; a vertical blank counter would be sufficient.

Back-end Compositing
Graphics accelerators may support layering in the DAC (digital-analog converter). GDI+ will provide services that allow the driver to hook the top
n layers in the z-order, and render them in hardware. It is up to the hardware as to the number of layers it supports in the back-end; GDI+ will emulate any layers lower than the top n using front-end compositing. As bandwidth considerations warrant, the driver can dynamically ask GDI+ to request more or fewer overlays for the back-end. There will be no restrictions on overlay size, nor any restrictions as to how many overlays can be stacked over a single area of the screen, except so far as the driver has the ability to decline or accept overlays at any time.

Back-end layers do have the advantage of allowing separate per-layer gamma correction tables, which cannot be properly emulated using front-end compositing because of the global gamma. There need only be two lookup tables; one to adjust for an sRGB gamma and one to adjust for a 1.0 linear gamma. Layers will be marked to indicate which lookup table should be used. The output of the gamma correction table can be higher precision than that of the input in order to allow more precise color adjustment as detailed in the section on desktop gamma, although it is not required.

Note that backend video overlays with alpha are already a requirement of Windows Logo Program; GDI+ does not remove this requirement.

Call to Action

For back-end compositing, support two gamma lookup tables.

For front-end compositing, support asynchronous scan-line triggered Blts.

Top of pageTop of page

Gamma on the Desktop

One of the goals of GDI+ is to allow high performance gamma corrected drawing. Most graphics hardware today interpolates in RGB space, but RGB space for most monitors does not have a linear relation to intensity as perceived by the eye--rather it's related to a gamma power function.

Looking at the default-color gradient-fill title bars of Microsoft Windows 2000 or Windows 98 vividly shows an example of such an artifact. One would naively expect that the halfway point in terms of color intensity between the two end colors would occur exactly in the middle of the title bar. But because the gradient fill is done using linear interpolation in nonlinear RGB space, the halfway point in terms of color intensity actually appears to be approximately 70% to one side of the title bar.

Unfortunately, a programmable lookup table built into the RAMDAC that can effectively convert the entire desktop to "linear" RGB space, thus counterbalancing the nonlinear gamma of the monitor, is not sufficient. The problem is that even with 10 or more bits of precision per color channel as the output of the table, there is a loss in perceptual color. This is because the eye is more perceptive of dark areas than bright areas, and consequently needs more resolution for dark colors.

More specifically, the eye perceives intensity according to a cube function, and so the approximately 2.2 power function of a standard sRGB monitor already closely matches the perceptual ability of the eye, much more so than the 1.0 power function of a linear RGB space. In addition, every increment of an 8 bit per color channel with a gamma of about 2.2 approximately matches the smallest increment the eye can see; if the gamma is changed to 1.0, more bits per color channel would be needed.

The net result is that if a 1.0 gamma were programmed into the RAMDAC table, more than 8 bits per color channel in the frame buffer would be needed to get the perceptual equivalent of 8 bits per color channel with an sRGB gamma. The primary surface format likely cannot be changed to more than 8 bits per color channel any time soon, if only for reasons such as backwards compatibility with Microsoft DirectDraw applications.

Full-screen applications have the choice as to whether the advantage of gamma correction outweighs the loss of perceptual color, and so can choose to set a linear 1.0 gamma or not. However, because of its shared environment, the global desktop cannot simply be made to have a linear 1.0 gamma because high fidelity image applications such as Adobe Photoshop would incur unacceptable color loss.

It is also unacceptable to employ a scheme where the foreground window gets to decide the linearity of the desktop gamma, because of the "palette flashes" that would result, and because in a composited desktop it is difficult to determine what the "foreground window" actually is.

Consequently, the base desktop will stay in a nonlinear gamma. The precise gamma chosen will be that defined in the sRGB specification. There are other advantages to having the desktop with an sRGB profile, including the ability to render most bitmaps without having to do any gamma correction, since most bitmaps already implicitly assume an sRGB gamma.

Given that the desktop is stuck with a non-linear gamma, the question arises as to how gamma corrected rendering may be accomplished. It is not expected that drawing primitives will have to gamma correct as they are rendered (with some exceptions, such as for antialiased text). Rather, it will be possible for applications to create their windowed rendering surfaces with a linear gamma, with the conversion to non-linear gamma being done when the completed frame is swapped to the screen.

The gamma correction may happen via the following means:

Particular applications can choose whether their windows will be gamma corrected or not, and with back-end windows layering the RAMDAC will gamma correct on the backend for those particular windows. Note that a limited form of this is already supported for video overlays in DirectDraw by the DdColorControl HAL (hardware abstraction layer) call. Note also that this solution does not work for front-end windows layering support.

Particular applications can choose to do their rendering to 1.0 gamma offscreen surfaces, which are then gamma corrected through a hardware Blt to the sRGB gamma screen. The gamma correcting Blt should be able to do the correction using an 8-bit lookup table; it can be assumed for efficiency that only one lookup table will be frequently used, mainly for the conversion of a 1.0 gamma to the display's gamma. The source format should handle 8-bit per channel RGB surfaces. (Note that with 8-bit per channel surfaces, this compromise results in significant color precision loss, although it does permit hardware accelerated rendering with gamma correction. For a discussion of the resulting error, see Dirty Pixels by Jim Blinn, IEEE Computer Graphics and Applications, July, 1989.)

For a discussion on gamma correct rendering, see A Ghost In A Snowstorm by Jim Blinn, IEEE Computer Graphics, January/February 1998.

Call to Action

Build gamma-correcting lookup tables into layers, or implement a gamma-correcting Blt.

Top of pageTop of page

Driver Model

A common theme of GDI+ is the integration of 2-D and 3-D, from the API (application programming interface) to the DDI (device driver interface). On the DDI side, GDI+ will use existing 2-D and 3-D hardware acceleration capabilities by leveraging the Microsoft Direct3D command stream for all hardware acceleration. GDI+ will use a mixture of D3D and GDI+ command tokens for all its rendering. In this way, GDI+ will share a common driver interface with Direct3D, giving the following benefits:

GDI+ can freely intermix "3-D" and "2-D" rendering without incurring costly state changes.

There will be reduced code duplication in the driver from eliminating multiple ways to access the same hardware accelerations, with corresponding improvements in test coverage.

There will be more efficient resource utilization for writing drivers and doing optimization and debugging, resulting in better drivers.

GDI+ will define new tokens for primitives that are not already described by existing Direct3D and DirectDraw tokens. All GDI+ accelerated drawing primitives will be much "closer to the metal" than is so with GDI, with less abstraction of the hardware. For example, GDI+ might specify "draw this list of triangles" instead of "stroke this path with this geometric pen."

For backward compatibility considerations, the existing GDI DDI will continue to be supported.

Call to Action

Graphics accelerators should be prepared for mixtures of 2-D and 3-D drawing primitives in the command stream. It should not be expensive to do a context switch between "2-D" and "3-D" drawing primitives.

Top of pageTop of page

Displaced Subpixel Rendering

Future display developments are expected to take advantage of displaced subpixel rendering to improve rendering quality. For optimal quality, it should be possible for GDI+ to query the physical pixel attributes of the display device. It is recommended that the graphics device be able to convey the following information about each of the connected display devices:

1.

Whether the display device is an LCD screen.

2.

Whether the display device is connected to the graphics device via a digital connection, such as DVI. Digital connections for LCDs, such as the Digital Video Interface (DVI) are preferred, since they provide more accurate displaced subpixel rendering quality. For more details on DVI, see http://www.ddwg.org This link leaves the Microsoft.com site.

3.

Striping of the LCD, and whether it's horizontal or vertical. The pixels of most color LCDs are composed of red, green and blue fragments in a horizontal or vertical orientation, and this configuration is known as the "striping." Vertical striping, where the screen is composed of vertical strips of red, green, and blue fragments, is preferred since it effectively provides three separate fragments in a row for every pixel, thereby giving more horizontal subpixel resolution.

4.

Ordering of the LCD color fragments. The color fragments may be ordered R-G-B or B-G-R from left-to-right or top-to-bottom. Note that this is independent of the RGB ordering of the frame buffer.

5.

Whether the LCD screen is operating in its native pixel resolution, or whether it is a scaled resolution. Because of the nature of displaced subpixel rendering, its quality is reduced when a pixel in the frame buffer is mapped to more than one physical pixel on the LCD screen. An example of this is when the native resolution is 1024x768, but the current mode is 640x480, and the device scales the output to fill the entire screen.

6.

Gamma of the display. This may be any value from 1.0 to 2.4, although it is recommended that the display device use the sRGB gamma.

Microsoft is developing a proposal to VESA to allow the identification of these characteristics in the monitor EDID (Extended Display Identification Data). For displays that are connected via a plug and play interface that support DDC (display data channel) through a digital interface, such as the new Digital Visual Interface, it is recommended that monitor manufacturers implement the enhanced EDID. For devices that have an integrated display and that use a digital interface that does not support DDC, the display driver can return the appropriate flags. The display driver can also override the monitor settings, for cases such as landscape/portrait switching.

Displaced subpixel rendering performance will be significantly enhanced when the surfaces are kept in standard RGB format and are directly writeable by the CPU. As such, Windows 2000 drivers should ensure that engine-managed surfaces are used for the primary display and for all off-screen device bitmaps.

Call to Action

Build in mechanisms to allow the physical pixel attributes of the display device to be queried.

Make sure drivers are using engine-managed surfaces for the primary display and for off-screen device bitmaps.

Top of pageTop of page

3-D User Interfaces

3-D user interfaces will demand quality filtering. Text comprises a large portion of today's user interfaces, and if it's pushed into the third dimension, it will have to be as clear and legible in 3-D as it is in 2-D. Anisotropic filtering is a requirement for readability. Bilinear and trilinear filtering emphatically do not suffice. Note that anisotropic filtering for a 3-D user interface is much more important than for a game, which with its sustained high speed of animation is often not still long enough for the filtering deficiencies to be evident. For more details on anisotropic filtering, see the Direct3D reference rasterizer.

Texture hardware also has to support non-power-of-2 texture sizes. The reason is that most immediate applications of a 3-D UI will be to take existing 2-D UI and extrude it into 3-D, and current windows and popups are arbitrary non-power-of-2 sizes. Tiling is not required for non-power-of-2 textures. Note that in this application, texture surfaces are live and are ideally rendered using hardware accelerations. Note also that since the textures are live surfaces, mipmaps cannot be used.

If 3-D is to become pervasive in user interfaces, it will have to run in the resolution and the color depth that the user chooses for their desktop. As such, 3-D accelerations should work in all RGB color depths.

Call to Action

Implement anisotropic filtering.

Support non-power-of-2 texture sizes.

Support 3-D accelerations at all RGB desktop color depths.

Top of pageTop of page

Next-Generation Shared Memory Architectures

A likely area for advancement in PC graphics will be the adoption of next generation shared memory architectures. Shared memory architecture (SMA) implementations will have significant implications and possibilities for GDI+.

SMA PotentialGraphics rendering methods for the desktop are diverging. On the one hand, video hardware is advancing at an incredible pace, yielding tremendous speed and rendering quality. On the other hand, CPUs are getting ever faster and more powerful with the addition of instruction set extensions such as Intel MMX, Intel Streaming SIMD, and AMD 3DNow!, allowing software rendering to be ever faster and the algorithms ever more flexible and innovative. The problem is that performance-wise it is becoming progressively more expensive to mix the two methods. More specifically, rendering performance is critically dependent on the allocation of the rendering surface.

If the rendering surface is located in video memory, it permits the full benefit of graphics hardware accelerated rendering. Writes by the CPU to video memory surfaces are also acceptably fast--thanks to write combining--and throughput is typically 200 MB/s on the latest AGP (accelerated graphics port) systems. Read speeds, however, are terrible, typically maxing out at 12 MB/s on the latest AGP systems. This read performance is anathema to most MMX routines, which are typically read-modify-write by nature of their vector processing. It is also a problem for any routines that must explicitly do read or read-modify-write operations, such as is the case with almost all image processing filters or Microsoft DirectX Transform plug-ins.

If the rendering surface is located in regular system memory, CPU rendering is maximally efficient because the CPU data cache is leveraged for all the read or read-modify-write operations. However, the video hardware obviously cannot be used.

AGP memory is even worse for the rendering surface. While it can permit hardware-accelerated rendering, the typical latency involved with rendering to AGP memory results in terrible performance (although it is potentially still faster than the equivalent software rendering). In addition, AGP memory has to be marked as write-combined or write-through, meaning that CPU read performance is poor. Effectively, AGP memory is useful only for read-only surfaces by the video accelerator (such as textures), and is not at all useful as a rendering target.

All of this conspires to make it difficult to use both software and hardware rendering to the same surfaces, such as will happen in a composited desktop environment, or even frequently today in a web page. This problem was one of the reasons that Microsoft ChromeEffects™ had to employ system-memory surfaces for all its live web rendering surfaces. Web pages can contain components that are rendered using software routines such as DirectX Transform, which meant that for the aforementioned performance reasons they had to reside in cacheable system memory, even though the majority of rendering would be perfectly hardware acceleratable. So every time the web page updated, the page had to be updated using software rendering, and then the modified area had to be copied to an AGP or video memory texture.

With the advent of next generation shared memory architecture video accelerators, we face a unique time to address this problem by making it possible to have cache coherent AGP style surfaces. There will still be issues with hardware acceleration latency, and the fact that after hardware accelerator rendering the CPU cache data will have to be re-validated, but nevertheless the potential wins are enormous.

Because shared memory architectures are on the same side of the bus as the CPU, they allow a host of possibilities that have not been possible before. For example, it may be possible to employ the graphics accelerator to do all printer rasterization, giving benefits such as hardware acceleration to 3-D printing.

We are also looking at possibilities such as pageable SMA memory.

SMA ConsiderationsThis section contains information about what must be considered for SMA implementations.

Physical Contiguity. One of the primary advantages of AGP memory, from an operating system point of view, is that by virtue of the GART (Graphics Address Re-mapping Table) it allows pages to be committed dynamically, with no restriction as to their physical contiguity. This is an extremely important property, as it allows system memory to be used for other purposes by the system when AGP memory is not needed. The operating system cannot honor any allocation requests for physically contiguous pages of any size more than a couple of pages except at boot time. It is strongly recommended, as a result, that any SMA solutions employ a GART-like solution to allow physically discontiguous system memory pages to be dynamically committed and freed for use by the video accelerator.

Memory Considerations. With SMA implementations, in addition to implications of reduced memory bandwidth available to the CPU, there are significant memory considerations.

Figure 1. Performance vs. Memory

Figure 1 illustrates the memory constraints when running the industry standard Ziff-Davis Business Winstone benchmark with various amounts of available system memory.

A modern system with 64 MB is more than twice as fast as the same system with only 32 MB of memory. Every megabyte of memory taken away from the CPU on a 64MB machine will have a marked impact on overall system performance that cannot be made up by faster graphics rendering.

Other high-end performance benchmarks will show even more significant performance penalties when memory is committed but not used. It is therefore not acceptable for the video accelerator or the driver to reserve a large portion of system memory at POST (power-on self-test) or boot time for its exclusive use.

Call to Action

Vendors developing shared memory architectures in which the graphics controller shares the high-speed memory bus with the CPU should engage Microsoft to discuss the architecture of their products and the methods for implementing the above functionality, such as cacheable shared memory and a GART-equivalent memory mapping model.

Top of pageTop of page

Continuous Tone Model for Printers

GDI+ presents a continuous tone painting model to applications, which has significant implications for printers. The GDI+ painting model permits the programmer to assume a destination surface with full continuous tone support, allowing the programmer to employ features such as alpha blending without having to consider the underlying format of the frame buffer. More specifically, even though a printer's frame buffer might be kept as a halftoned 1bpp surface, the application can request to do a blend operation of the frame buffer contents with a source bitmap, and the blend has to be done using continuous tone semantics as if the destination surface is kept as a full continuous tone RGB format.

Printer languages will have to be modified to support blending semantics, and it may be necessary to allow continuous tone frame buffers. It is expected that the vast majority of printed pages will continue to be comprised mostly of monochrome text, however, so GDI+ will provide to the printer driver hints that indicate when a page (or finer granularity areas on the page) must be rendered using a continuous tone backing.

Call to Action

Vendors who want to enhance their printer languages to support the GDI+ continuous tone and blending models should engage Microsoft to discuss how best this can be done.

Top of pageTop of page

Conclusion

This article has presented an overview of the graphics hardware direction for GDI+. The hardware directions of GDI+ will continue to advance and evolve, based in large part on your feedback. To offer comments and input on design, send mail to gdiphw@microsoft.com.

Top of pageTop of page

References

Blinn, Jim, A Ghost In A Snowstorm, IEEE Computer Graphics, January/February 1998

Blinn, Jim, Dirty Pixels, IEEE Computer Graphics and Applications, July, 1989

Porter and Duff, Compositing Digital Images, Computer Graphics, July 1984

AGP whitepaper: http://developer.intel.com/drg/mmx/AppNotes/agp.htm This link leaves the Microsoft.com site

Digital Display Working Group:
http://www.ddwg.org This link leaves the Microsoft.com site

Windows DDK documentation:
http://www.microsoft.com/whdc/devtools/ddk/default.mspx
Ziff-Davis Winstone benchmark:
http://www.zdbop.com This link leaves the Microsoft.com site


Top of pageTop of page