Traditional online transaction processing (OLTP) systems face two major challenges while trying to utilize modern hardware: (1) exploiting the abundant thread-level parallelism given by multicores and (2) taking advantage of the aggressive micro-architectural features. On the one hand, the inherent communication in traditional high performance OLTP lead to scalability bottlenecks on today’s multicore and multisocket hardware. On the other hand, the large instruction footprint of the transactions cause OLTP to waste around half of its execution cycles to memory stalls.
In this talk, I first classify the most problematic critical sections of an OLTP system and show how one can eliminate them through physiological partitioning in the context of a shared-everything architecture. Then, I demonstrate that the worker threads of an OLTP system usually execute similar transactions in parallel, meaning that threads running on different cores share a non-negligible amount of instructions. By spreading the execution of a transaction over multiple cores through either programmer-transparent or transaction-aware techniques, we enable both an ample L1 instruction cache capacity, exploit the instruction commonality among transactional threads, and significantly reduce instruction misses by localizing instructions to cores as threads migrate.