OpenNMT

GLaM: Efficient Scaling of Language Models with Mixture-of-Experts