The solution of classification problems using statistical techniques requires appropriately labelled training data. In the case of multi-channel data, however, the labels may only be available in aggregate form rather than as separate labels for each individual channel. Standard techniques, using a trained model to predict each channel separately, are therefore precluded. In this paper we present a new method of training neural network classifiers from aggregate labels. This technique allows the network to learn what significant events on individual channels result in the given labels. We apply this training method to two synthetic (but, in the second case, realistic) problems and compare the results with those from a classifier trained on the accurate channel labels, which would usually not be available. On previously unseen test data for the two problems 97.75% and 99.1% of feature vectors were classified correctly. These represent reductions of only 0.5% and 0.1% from classifiers trained on accurate labels for all channels.