This talk will describe research on evolving general purpose graphics processor units (GPGPU) architecture towards more irregular parallel workloads. The first part of the talk will summarize research on implementing hardware transactional memory on GPUs. The challenges will be summarized along with our proposed solution, KILO-TM. Our analysis shows KILO-TM captures 59% of the performance of fine-grained locking, while being on average 128× faster than executing all transactions serially, for an estimated hardware area overhead of 0.5% of a contemporary GPU. The second part of the talk will summarize recent efforts to port memcached to run on an existing AMD Fusion APU where we observe up to 7.5X performance increase when executing the key-value look-up handler on the integrated GPU.