Chip Multiprocessors (CMP) with Thread-Level Speculation (TLS) have become the subject of intense research. However, TLS is suspected of being too energy inefficient to compete against conventional processors. In this paper, we refute this claim. To do so, we first identify the main sources of dynamic energy consumption in TLS. Then, we present simple energy-saving optimizations that cut the energy cost of TLS by over 60% on average with minimal performance impact. The resulting TLS CMP, populated with four 3-issue cores,
speeds-up full SPECint 2000 codes by 1.27 on average, while keeping the fraction of the chip’s energy consumption due to TLS to only 20%. Compared to a 6-issue superscalar at the same frequency, the TLS CMP is on average faster, while consuming only 85% of its total on-chip power.