Multiclass action detection in complex scenes is a challenging problem because of cluttered backgrounds and the large intra-class variations in each type of actions. To achieve efﬁcient and robust action detection, we characterize a video as a collection of spatio-temporal interest points, and locate actions via ﬁnding spatio-temporal video subvolumes of the highest mutual information score towards each action class. A random forest is constructed to efﬁciently generate discriminative votes from individual interest points, and a fast top-K subvolume search algorithm is developed to ﬁnd all action instances in a single round of search. Without signiﬁcantly degrading the performance, such atop-K search can be performed on down-sampled score volumes for more efﬁcient localization. Experiments on a challenging MSR Action Dataset II validate the effectiveness of our proposed multiclass action detection method. The detection speed is several orders of magnitude faster than existing methods.