Configuration overheads seriously limits the usefulness of FPGA partial reconfiguration. In this paper, we propose a combination of two techniques to minimize the overhead. First, we design and implement fully streaming DMA engines to saturate the configuration throughput. Second, we exploit a simple form of data redundancy to compress the configuration bitstreams, and we implement an intelligent ICAP controller to perform decompression at runtime. The results show that our design achieves an effective configuration data transfer throughput of up to 1.2 Gbytes/s, which actually well surpasses the theoretical upper bound of the data transfer throughput, 400 Mbytes/s. Specifically, our fully streaming DMA engines reduce the configuration time from the range of seconds to the range of milliseconds, a more than 1000-fold improvement. In addition, our simple compression scheme achieves up to a 75% reduction in bitstream size and results in a decompression circuit with negligible hardware overhead.