Recently, sort-middle triangle rasterization, implemented as software on a manycore GPU with vector units (Larabee), has been proposed as an alternative to hardware rasterization. The main reasoning is, that only a fraction of the time per frame is spent sorting and rasterizing triangles. However is this still a valid argument in the context of geometry amplification when the number of primitives increases quickly? A REYES like approach, sorting parametric patches instead, could avoid many of the problems with tiny triangles. To demonstrate that software rasterization with geometry amplification can work in real-time, we implement a tile based sort-middle rasterizer in CUDA and analyze its behavior: First we adaptively subdivide rational bicubic B´ezier patches. Then we sort those into buckets, and for each bucket we dice, grid-shade, and rasterize the micropolygons into the corresponding tile using on-chip caches. Despite being limited by the amount of available shared memory, the number of registers and the lack of an L3 cache, we manage to rasterize 1600×1200 images, containing 200k sub-patches, at 10-12 fps on an nVidia GTX 280. This is about 3x to 5x slower than a hybrid approach subdividing with CUDA and using the HW rasterizer. We hide cracks caused by adaptive subdivision using a flatness metric in combination with rasterizing B´ezier convex hulls. Using a k-buffer with a fuzzy z-test we can render transparent objects despite the overlap our approach creates. Further, we introduce the notion of true back patch culling, allowing us to simplify crack hiding and sampling.