How do word_features work?

How does the network understand or work with word_features ? For example, if I label a word as offensive / unimportant using a feature word|O , how or where in the code can I tweak the cost function to impose a penalty on such words in order to avoid generating them in the translation; the use case here being summarization.

Any guidance would be really helpful!

Thanks :slight_smile:


On the source side, word features act as additional information by “enriching” the word embeddings. Nothing more.

If you want to tweak the cost function you could use the attention output and penalize high probability to these words. However, I would expect the model to learn by itself words that should be ignored.