Custom attention mask

That is correct. Probably because the context and underlying semantics that these words provide is still very much relevant and helps the system to generalise better. So just blocking off all attention might not be the way to go.

I’ll sent in a WIP PR that people can use to get started. But unfortunately I do not have the time to add tests or documentation, so I’ll add it “as is” with a hopefully clear enough description.