Multi-user multiple-input multiple-output (MU-MIMO) is the latest communication technology that promises to linearly increase the wireless capacity by deploying more antennas on access points (APs). However, the large number of MIMO antennas will generate a huge amount of digital signal samples in real time. This imposes a grand challenge on the AP design by multiplying the computation and the I/O requirements to process the digital samples. This paper presents BigStation, a scalable architecture that enables realtime signal processing in large-scale MIMO systems which may have tens or hundreds of antennas. Our strategy to scale is to extensively parallelize the MU-MIMO processing on many simple and low-cost commodity computing devices. Our design can incrementally support more antennas by proportionally adding more computing devices. To reduce the overall processing latency, which is a critical constraint for wireless communication, we parallelize the MU-MIMO processing with a distributed pipeline based on its computation and communication patterns. At each stage of the pipeline, we further use data partitioning and computation partitioning to increase the processing speed. As a proof of concept, we have built a BigStation prototype based on commodity PC servers and standard Ethernet switches. Our prototype employs 15 PC servers and can support real-time processing of 12 software radio antennas. Our results show that the BigStation architecture is able to scale to tens to hundreds of antennas. With 12 antennas, our BigStation prototype can increase wireless capacity by 6.8 with a low mean processing delay of 860s. While this latency is not yet low enough for the 802.11 MAC, it already satisfies the real-time requirements of many existing wireless standards, e.g., LTE and WCDMA.