As data centers are increasingly focused on energy efficiency, it becomes important to develop low power implementations of the various applications that run on them. Data compression plays a critical role in data centers to mitigate storage and communication costs. This work focuses on building a low power, high performance implementation for canonical Huffman encoding. We develop a number of different hardware and software implementations targeting Xilinx Zynq FPGA, ARM Cortex-A9, and Intel Core i7. Despite its sequential nature, we show that our hardware accelerated implementation is substantially more energy efficient than both the ARM and Intel Core i7 implementations. When compared to highly optimized software running on the ARM processor, our hardware accelerated implementation has approximately 15 times more throughput with 10% higher power usage, resulting in an 8X benefit in energy efficiency (measured in encodings/Watt). Additionally, our hardware accelerated implementation is up to 80% faster and over 230 times more energy efficient than a highly optimized Core i7 implementation.