NIPS: Oral Session 4 – Jason Yosinski

December 9, 2014
Jason Yosinski | Cornell University

How transferable are features in deep neural networks?

Many deep neural networks trained on natural images exhibit a curious phenomenon in common: on the first layer they learn features similar to Gabor filters and color blobs. Such first-layer features appear not to be specific to a particular dataset or task, but general in that they are applicable to many datasets and tasks. Features must eventually transition from general to specific by the last layer of the network, but this transition has not been studied extensively. In this paper we experimentally quantify the generality versus specificity of neurons in each layer of a deep convolutional neural network and report a few surprising results. Transferability is negatively affected by two distinct issues: (1) the specialization of higher layer neurons to their original task at the expense of performance on the target task, which was expected, and (2) optimization difficulties related to splitting networks between co-adapted neurons, which was not expected. In an example network trained on ImageNet, we demonstrate that either of these two issues may dominate, depending on whether features are transferred from the bottom, middle, or top of the network. We also document that the transferability of features decreases as the distance between the base task and target task increases, but that transferring features even from distant tasks can be better than using random features. A final surprising result is that initializing a network with transferred features from almost any number of layers can produce a boost to generalization that lingers even after fine-tuning to the target dataset.

Research Area
- Artificial intelligence

Watch Next

AutoAdapt demo
April 24, 2026
Microsoft Transforms its Cloud Supply Chain with Optimization and Generative AI
April 16, 2026
Peter Lee,

Konstantina Mellou,

Kayla Kummerlowe

, et. al.
Will machines ever be intelligent?
March 23, 2026
Subutai Ahmad,

Doug Burger,

Nicolo Fusi
Dion2: A new simple method to shrink matrix in Muon
March 3, 2026
Anson Ho,

Kwangjun Ahn
ARO: A new lens on matrix optimization for LLMs
March 3, 2026
Anson Ho,

Wenbo Gong,

Chao Ma
Lessons from deploying HealthBots with experts-in-the-loop
March 3, 2026
Anson Ho,

Mohit Jain
Teaching small language models to think like optimization experts with OptiMind
March 3, 2026
Anson Ho,

Xinzhi Zhang
Agent Lightning: One learning system that makes all agents evolve
March 3, 2026
Anson Ho,

Luna K. Qiu
Magentic Marketplace: Testing societies of agents at scale
March 3, 2026
Gagan Bansal,

Anson Ho
Efficient Distributed Orthonormal Optimizers for Large-Scale Training
February 12, 2026
Kwangjun Ahn

NIPS: Oral Session 4 – Jason Yosinski

Research Area

Watch Next