Microsoft Research Blog

Microsoft Research Blog

The Microsoft Research blog provides in-depth views and perspectives from our researchers, scientists and engineers, plus information about noteworthy events and conferences, scholarships, and fellowships designed for academic and scientific communities.

Learning to teach: Mutually enhanced learning and teaching for artificial intelligence

December 5, 2018 | By Fei Tian, Researcher

Teaching is super important. From an individual perspective, a student learning on his or her own is never ideal; a student needs a teacher’s guidance and perspective to be more effectively educated. Taking the societal perspective, teaching enables civilization to be passed on to the next generation. Human teachers have three concrete responsibilities: providing students with qualified teaching material (for example, textbooks); defining the appropriate skill set to be mastered by the students (for example, algebra skills or advanced calculus); and setting suitable learning objectives (for example, course projects and exams) in order to evaluate how well the students are learning, and based on which feedback can be provided to students.

The human teacher and student learning process happens to make a good analogy for understanding artificial intelligence and machine learning problems. Figure 1 illustrates the basic components of a typical machine learning process. It is an optimization process including the training data D, a specific model space Ω and the loss function L.

Figure 1 – The typical machine learning process.

Figure 1 – The typical machine learning process.

Figure 2 describes an analogy between human teaching and teaching in AI, incorporating the aforementioned three responsibilities of teachers. First, proper training data must be chosen for the AI student, akin to the textbooks chosen by teachers. We call this data teaching. Second, a good student model hypothesis space needs to be designed, similar to the skillset to be taught to human students. We call this model space teaching. Third, appropriate loss functions must be set to optimize student models, similar to the exams designed by human teachers. We call this loss function teaching.

Figure 2 – An analogy of human teaching and teaching for AI.

Figure 2 – An analogy of human teaching and teaching for AI.

We proposed an effective and efficient framework called “Learning to Teach” (L2T), published at ICLR this year. In L2T, we aimed to discover the best teaching strategy for AI using a completely automatic approach that takes into consideration the different abilities of various students, while maintaining the mutual growth of students and teachers. We demonstrated that an optimal data order can be discovered via L2T, successfully reducing training data by 40 percent for student model learning. At NeurIPS 2018, we extended the L2T framework from data teaching to loss function teaching; please refer to our paper, Learning to Teach with Dynamic Loss Functions.

The goal of the loss function teaching is to automatically discover the best loss function to train the student model and ultimately improve the student model’s performance. We set two requirements for the loss function teaching. An adaptive requirement that states that the machine teachers should set different loss functions along with the different training phases of student model training. And a dynamic requirement—the machine teachers should optimize themselves to constantly enhance teaching ability to achieve co-growth with student model. To satisfy these two requirements, we set the loss function to be a neural network Lϕ (fω (x), y) with ϕ as its coefficient, and we used a teacher model μθ to dynamically set the coefficient ϕt by considering the student training status st. Then the student model is guided via the dynamic loss functions output via the teacher model at different timestep t, as shown in Figure 3.

Figure 3 – The training process of the student model (shown by the yellow line in the bottom 2d surface), under the guidance of different loss functions (the colored mesh surface), output via the teacher model.

Figure 3 – The training process of the student model (shown by the yellow line in the bottom 2d surface), under the guidance of different loss functions (the colored mesh surface), output via the teacher model.

In addition, we designed an effective optimization method for the teacher model μθ based on reverse mode differentiation. With this method, we achieved gradient based optimization instead of using the expensive reinforcement learning or evolutionary computing-based methods. We verified our approach both on image classification tasks and neural machine translation tasks. On CIFAR-10, CIFAR-100 image classification task and on IWSLT-14 German-English translation task, we achieved significant improvement compared with original cross entropy loss function and other proposed static loss functions, which clearly shows the effectiveness of our adaptive and dynamic loss functions.

In summary, in this work we extended the “Learning to Teach” framework to the automatic design of loss function. Through careful inspection of human teaching and efficient optimization algorithms, it’s possible to discover the best adaptive and dynamic loss functions to train a deep neural network that can achieve impressive performances. We anticipate that in the future, there is great potential for L2T, both in terms of theoretical justification and empirical evidence.

Up Next

Artificial intelligence

Making intelligence intelligible with Dr. Rich Caruana

Episode 26, May 30, 2018 - Dr. Rich Caruana talks about how the rise of deep neural networks has made understanding machine predictions more difficult for humans, and discusses an interesting class of smaller, more interpretable models that may help to make the black box nature of machine learning more transparent.

Microsoft blog editor

Artificial intelligence, Human language technologies

Boundary-seeking GANs: A new method for adversarial generation of discrete data

Generative models are an important subset of machine learning goals and tasks that require realistic and statistically accurate generation of target data. Among all available generative models, generative adversarial networks (GANs) have emerged recently as a leading and state-of-the-art method, particularly in image generation tasks. While highly successful with continuous data, generation of discrete data […]

Devon Hjelm

Postdoc

Artificial intelligence

Transfer learning for machine reading comprehension

By Xiaodong He, Principal Researcher, Microsoft Research For human beings, reading comprehension is a basic task, performed daily. As early as in elementary school, we can read an article, and answer questions about its key ideas and details. But for AI, full reading comprehension is still an elusive goal–but a necessary one if we’re going […]

Microsoft blog editor