In recent years, deep neural networks have been successfully applied to model visual concepts and have achieved competitive performance on many tasks. Despite their impressive performance, traditional deep networks are subjected to the decayed performance under the condition of lacking sufficient training data. This problem becomes extremely severe for deep networks trained on a very small dataset, making them overfitting by capturing nonessential or noisy information in the training set. Toward this end, we propose a novel generalized deep transfer networks (DTNs), capable of transferring label information across heterogeneous domains, textual domain to visual domain. The proposed framework has the ability to adequately mitigate the problem of insufficient training images by bringing in rich labels from the textual domain. Specifically, to share the labels between two domains, we build parameter- and representation-shared layers. They are able to generate domain-specific and shared interdomain features, making this architecture flexible and powerful in capturing complex information from different domains jointly. To evaluate the proposed method, we release a new dataset extended from NUS-WIDE at http://imag.njust.edu.cn/NUS-WIDE-128.html. Experimental results on this dataset show the superior performance of the proposed DTNs compared to existing state-of-the-art methods.