Binary-input compressive sensing (BiCS) has recently been applied to wireless communications as a modulated coding scheme for seamless rate adaptation. Different from conventional channel codes which generate binary symbols with logical-OR (XOR) operations, BiCS generates multilevel symbols through weighted sum operation. Although BiCS can be decoded by message passing, it needs to compute the convolution of probability functions in each iteration. The high decoding complexity has prevented the technique from being applied to practical use. In this paper, we propose a fast BiCS decoding algorithm and its corresponding partial-parallel hardware design. In this algorithm, we first build lookup tables to solve the computationally intensive problem of convolution. Through these tables, we successfully convert the convolution of probabilities into the polynomial of some exponential terms. This key step allows us to use log-likelihood ratio as message in message passing decoding and a fast algorithm is developed by approximate computing. We further design a partial-parallel hardware decoder. To avoid memory collision, we propose a multilevel cyclic-shift approach to generate the CS measurement matrix. We design horizontal unit processors with the proposed tables for iterative computing. Our analyses show that the proposed fast algorithm can reduce multiplications by nearly 90%. The decoding speed of our field-programmable gate array design reaches the range of communication rate in modern wireless networks.