They have a system for including outside information into their model by encoding background information with a Transformer encoder and using it at inference.
Human Parity on CommonsenseQA: Augmenting Self-Attention with External Attention describes a similar approach.
1 Like