New language model from DeepMind

They have a system for including outside information into their model by encoding background information with a Transformer encoder and using it at inference.

Human Parity on CommonsenseQA: Augmenting Self-Attention with External Attention describes a similar approach.

1 Like