Scaling language models with more data, compute and parameters has driven
significant progress in natural language processing. For example, thanks to
scaling, GPT-3 was able to achieve strong results on in-context learning tasks.
However, training...