Multimedia and communication algorithms from the embedded system domain often make extensive use of floating-point arithmetic. Due to the complexity and expense of the floating-point hardware, these algorithms are usually converted to fixed point operations, or implemented using floating-point emulation in software. This study presents the design and implementation of custom floating-point units, leveraging the partial reconfiguration feature of state-of-the-art FPGAs. The custom floating-point units can be dynamically configured, loaded, and executed when needed by software applications. The system is binary compliant with the conventional MIPS architecture and the IEEE-754 standard, and supports most of the floating-point operations and relevant functionalities. Furthermore, we present various customization strategies and construct a set of optimized functional modules to meet different application demands or requirements. Using LINPACK as a floating-point intensive example, we replace a sequence of 25 instructions with a custom instruction, and demonstrate an overall 80x application speedup.