Sign in

    Case Studies in VMX128 Optimization

    Presentation from Gamefest 2010
    • Version:


      File Name:

      Date Published:


      File Size:

      41.1 MB

        Arm yourself with the latest techniques for writing ultra-fast VMX code. This talk discusses several VMX topics. First, a hands-on deep dive on how performance bottlenecks in the 2006 CustomVFetch XDK sample were discovered and how the VMX loop was rewritten for a 10X increase in performance by eliminating several expensive problems such as branching penalties, LHS penalties and the horrors of non-coalesced write-combined memory access. The talk demonstrates branching in VMX registers, packing 16-bit VMX floats, and solutions to many other problems that may be haunting your own code. We introduce a VMX random number generator to replace the expensive rand() operation. Next, we introduce an algorithm for generating convex hulls and another algorithm for generating and traversing kd trees. Finally, we discuss a VMX algorithm for fast DXT block compression of color and normal maps using both 4-wide float and 16-wide byte operations.
    • Supported Operating System

      Windows 7

        Powerpoint, WMA Audio
      • Powerpoint presentation and WMA audio