We study in this paper the problem of one-shot face recognition, with the goal to build a large-scale face recognizer capable of recognizing a substantial number of persons. Given that for face recognition one can leverage a large-scale dataset to learn good face representation, our study shows that the poor generalization ability of the one-shot classes is mainly caused by the data imbalance problem, which cannot be effectively addressed by multinomial logistic regression that is widely used as the final classification layer in convolutional neural networks. To solve this problem, we propose a novel supervision signal called underrepresented-classes promotion (UP) loss term, which aligns the norms of the weight vectors of the one-shot classes (a.k.a. underrepresented-classes) to those of the normal classes. In addition to the original cross entropy loss, this new loss term effectively promotes the underrepresented classes in the learned model and leads to a remarkable improvement in face recognition performance. The experimental results on a benchmark dataset of 21,000 persons show that the new loss term significantly helps improve the recognition coverage rate from 25.65% to 77.48% at the precision of 99% for underrepresented classes, while still keeps an overall top-1 accuracy of 99.8% for normal classes.