Self-supervised learning

Self-supervised learning (SSL) is a method of machine learning. It learns from unlabeled sample data. It can be regarded as an intermediate form between supervised and unsupervised learning. It is based on an artificial neural network.^[1] The neural network learns in two steps. First, the task is solved based on pseudo-labels which help to initialize the network weights.^[2]^[3] Second, the actual task is performed with supervised or unsupervised learning.^[4]^[5]^[6] Self-supervised learning has produced promising results in recent years and has found practical application in audio processing and is being used by Facebook and others for speech recognition.^[7] The primary appeal of SSL is that training can occur with data of lower quality, rather than improving ultimate outcomes. Self-supervised learning more closely imitates the way humans learn to classify objects.^[8]

Types[]

Training data can be divided into positive examples and negative examples. Positive examples are those that match the target. For example, if you're learning to identify birds, the positive training data are those pictures that contain birds. Negative examples are those that do not.^[9]

Contrastive SSL[]

Contrastive SSL uses both positive and negative examples. Contrastive learning's loss function minimizes the distance between positive samples while maximizing the distance between negative samples.^[9]

Non-contrastive SSL[]

Non-contrastive SSL uses only positive examples. Counterintuitively, NCSSL converges on a useful local minimum rather than reaching the expected identify function with zero loss. Effective NCSSL requires an extra predictor on the online side that does not back-propagate on the target side.^[9]

Comparison with other forms of machine learning[]

SSL belongs to supervised learning methods insofar as the goal is to generate a classified output from the input. At the same time, however, it does not require the explicit use of labeled input-output pairs. Instead, correlations, metadata embedded in the data, or domain knowledge present in the input are implicitly and autonomously extracted from the data.^[10] These supervisory signals, generated from the data, can then be used for training.^[8]

SSL is similar to unsupervised learning in that it does not require labels in the sample data. Unlike unsupervised learning, however, learning is not done using inherent data structures.^[10]

Semi-supervised learning combines supervised and unsupervised learning, requiring only a small portion of the learning data be labeled.^[3]

In transfer learning a model designed for one task is reused on a different task.^[11]

Examples[]

Self-supervised learning is particularly suitable for speech recognition. For example, Facebook developed wav2vec, a self-supervised algorithm, to perform speech recognition using two deep convolutional neural networks that build on each other.^[7]

Google's Bidirectional Encoder Representations from Transformers (BERT) model is used to better understand the context of search queries.^[12]

OpenAI's GPT-3 is an autoregressive language model that can be used in language processing. It can be used to translate texts or answer questions, among other things.^[13]

Bootstrap Your Own Latent is a NCSSL that produced excellent results on ImageNet and on transfer and semi-supervised benchmarks.^[14]

DirectPred is a NCSSL that directly sets the predictor weights instead of learning it via gradient update.^[9]

References[]

^ Abshire, Chris (2018-04-06). "Self-Supervised Learning: A Key to Unlocking Self-Driving Cars?". Medium. Retrieved 2021-06-09.
^ Doersch, Carl; Zisserman, Andrew (October 2017). "Multi-task Self-Supervised Visual Learning". 2017 IEEE International Conference on Computer Vision (ICCV). IEEE: 2070–2079. arXiv:1708.07860. doi:10.1109/iccv.2017.226. ISBN 978-1-5386-1032-9. S2CID 473729.
^ ^a ^b Beyer, Lucas; Zhai, Xiaohua; Oliver, Avital; Kolesnikov, Alexander (October 2019). "S4L: Self-Supervised Semi-Supervised Learning". 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE: 1476–1485. arXiv:1905.03670. doi:10.1109/iccv.2019.00156. ISBN 978-1-7281-4803-8. S2CID 167209887.
^ Doersch, Carl; Gupta, Abhinav; Efros, Alexei A. (December 2015). "Unsupervised Visual Representation Learning by Context Prediction". 2015 IEEE International Conference on Computer Vision (ICCV). IEEE: 1422–1430. arXiv:1505.05192. doi:10.1109/iccv.2015.167. ISBN 978-1-4673-8391-2. S2CID 9062671.
^ Zheng, Xin; Wang, Yong; Wang, Guoyou; Liu, Jianguo (April 2018). "Fast and robust segmentation of white blood cell images by self-supervised learning". Micron. 107: 55–71. doi:10.1016/j.micron.2018.01.010. ISSN 0968-4328. PMID 29425969.
^ Gidaris, Spyros; Bursuc, Andrei; Komodakis, Nikos; Perez, Patrick Perez; Cord, Matthieu (October 2019). "Boosting Few-Shot Visual Learning With Self-Supervision". 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE: 8058–8067. arXiv:1906.05186. doi:10.1109/iccv.2019.00815. ISBN 978-1-7281-4803-8. S2CID 186206588.
^ ^a ^b "Wav2vec: State-of-the-art speech recognition through self-supervision". ai.facebook.com. Retrieved 2021-06-09.
^ ^a ^b Bouchard, Louis (2020-11-25). "What is Self-Supervised Learning? | Will machines ever be able to learn like humans?". Medium. Retrieved 2021-06-09.
^ ^a ^b ^c ^d "Demystifying a key self-supervised learning technique: Non-contrastive learning". ai.facebook.com. Retrieved 2021-10-05.
^ ^a ^b R., Poornima; L., Ashok (2017). "Problem Based Learning a Shift from Teaching Paradigm to the Learning Paradigm". Indian Journal of Dental Education. 10 (1): 47–51. doi:10.21088/ijde.0974.6099.10117.6. ISSN 0974-6099.
^ Littwin, Etai; Wolf, Lior (June 2016). "The Multiverse Loss for Robust Transfer Learning". 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE: 3957–3966. arXiv:1511.09033. doi:10.1109/cvpr.2016.429. ISBN 978-1-4673-8851-1. S2CID 6517610.
^ "Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing". Google AI Blog. Retrieved 2021-06-09.
^ Wilcox, Ethan; Qian, Peng; Futrell, Richard; Kohita, Ryosuke; Levy, Roger; Ballesteros, Miguel (2020). "Structural Supervision Improves Few-Shot Learning and Syntactic Generalization in Neural Language Models". Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg, PA, USA: Association for Computational Linguistics: 4640–4652. arXiv:2010.05725. doi:10.18653/v1/2020.emnlp-main.375. S2CID 222291675.
^ Grill, Jean-Bastien; Strub, Florian; Altché, Florent; Tallec, Corentin; Richemond, Pierre H.; Buchatskaya, Elena; Doersch, Carl; Pires, Bernardo Avila; Guo, Zhaohan Daniel; Azar, Mohammad Gheshlaghi; Piot, Bilal (2020-09-10). "Bootstrap your own latent: A new approach to self-supervised Learning". arXiv:2006.07733 [cs.LG].

External links[]

Abshire, Chris (2018-04-06). "Self-Supervised Learning: A Key to Unlocking Self-Driving Cars?". Toyota Ventures. Retrieved 2021-10-05.
Doersch, Carl; Zisserman, Andrew (October 2017). "Multi-task Self-Supervised Visual Learning". 2017 IEEE International Conference on Computer Vision (ICCV): 2070–2079. arXiv:1708.07860. doi:10.1109/ICCV.2017.226. ISBN 978-1-5386-1032-9. S2CID 473729.
Doersch, Carl; Gupta, Abhinav; Efros, Alexei A. (December 2015). "Unsupervised Visual Representation Learning by Context Prediction". 2015 IEEE International Conference on Computer Vision (ICCV): 1422–1430. arXiv:1505.05192. doi:10.1109/ICCV.2015.167. ISBN 978-1-4673-8391-2. S2CID 9062671.
Zheng, Xin; Wang, Yong; Wang, Guoyou; Liu, Jianguo (2018-04-01). "Fast and robust segmentation of white blood cell images by self-supervised learning". Micron. 107: 55–71. doi:10.1016/j.micron.2018.01.010. ISSN 0968-4328. PMID 29425969.
Shenwai, Tanushree (2021-09-30). "Google AI's New Study Enhance Reinforcement Learning (RL) Agent's Generalization In Unseen Tasks Using Contrastive Behavioral Similarity Embeddings". MarkTechPost. Retrieved 2021-10-07.

[1] Abshire, Chris (2018-04-06). "Self-Supervised Learning: A Key to Unlocking Self-Driving Cars?". Medium. Retrieved 2021-06-09.

[2] Doersch, Carl; Zisserman, Andrew (October 2017). "Multi-task Self-Supervised Visual Learning". 2017 IEEE International Conference on Computer Vision (ICCV). IEEE: 2070–2079. arXiv:1708.07860. doi:10.1109/iccv.2017.226. ISBN 978-1-5386-1032-9. S2CID 473729.

[auto1-3] Beyer, Lucas; Zhai, Xiaohua; Oliver, Avital; Kolesnikov, Alexander (October 2019). "S4L: Self-Supervised Semi-Supervised Learning". 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE: 1476–1485. arXiv:1905.03670. doi:10.1109/iccv.2019.00156. ISBN 978-1-7281-4803-8. S2CID 167209887.

[4] Doersch, Carl; Gupta, Abhinav; Efros, Alexei A. (December 2015). "Unsupervised Visual Representation Learning by Context Prediction". 2015 IEEE International Conference on Computer Vision (ICCV). IEEE: 1422–1430. arXiv:1505.05192. doi:10.1109/iccv.2015.167. ISBN 978-1-4673-8391-2. S2CID 9062671.

[5] Zheng, Xin; Wang, Yong; Wang, Guoyou; Liu, Jianguo (April 2018). "Fast and robust segmentation of white blood cell images by self-supervised learning". Micron. 107: 55–71. doi:10.1016/j.micron.2018.01.010. ISSN 0968-4328. PMID 29425969.

[6] Gidaris, Spyros; Bursuc, Andrei; Komodakis, Nikos; Perez, Patrick Perez; Cord, Matthieu (October 2019). "Boosting Few-Shot Visual Learning With Self-Supervision". 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE: 8058–8067. arXiv:1906.05186. doi:10.1109/iccv.2019.00815. ISBN 978-1-7281-4803-8. S2CID 186206588.

[auto-7] "Wav2vec: State-of-the-art speech recognition through self-supervision". ai.facebook.com. Retrieved 2021-06-09.

[auto3-8] Bouchard, Louis (2020-11-25). "What is Self-Supervised Learning? | Will machines ever be able to learn like humans?". Medium. Retrieved 2021-06-09.

[:0-9] "Demystifying a key self-supervised learning technique: Non-contrastive learning". ai.facebook.com. Retrieved 2021-10-05.

[auto2-10] R., Poornima; L., Ashok (2017). "Problem Based Learning a Shift from Teaching Paradigm to the Learning Paradigm". Indian Journal of Dental Education. 10 (1): 47–51. doi:10.21088/ijde.0974.6099.10117.6. ISSN 0974-6099.

[11] Littwin, Etai; Wolf, Lior (June 2016). "The Multiverse Loss for Robust Transfer Learning". 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE: 3957–3966. arXiv:1511.09033. doi:10.1109/cvpr.2016.429. ISBN 978-1-4673-8851-1. S2CID 6517610.

[12] "Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing". Google AI Blog. Retrieved 2021-06-09.

[13] Wilcox, Ethan; Qian, Peng; Futrell, Richard; Kohita, Ryosuke; Levy, Roger; Ballesteros, Miguel (2020). "Structural Supervision Improves Few-Shot Learning and Syntactic Generalization in Neural Language Models". Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg, PA, USA: Association for Computational Linguistics: 4640–4652. arXiv:2010.05725. doi:10.18653/v1/2020.emnlp-main.375. S2CID 222291675.

[14] Grill, Jean-Bastien; Strub, Florian; Altché, Florent; Tallec, Corentin; Richemond, Pierre H.; Buchatskaya, Elena; Doersch, Carl; Pires, Bernardo Avila; Guo, Zhaohan Daniel; Azar, Mohammad Gheshlaghi; Piot, Bilal (2020-09-10). "Bootstrap your own latent: A new approach to self-supervised Learning". arXiv:2006.07733 [cs.LG].

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]