Efficient Self-Supervised Vision Transformers (EsViT)
This is a research project in exploring self-supervised learning (SSL) for computer vision. It aims to learn general-purpose image features from raw pixels without relying on manual supervisions, and the learned networks serve as the backbone of various downstream tasks.…