Large Vocabulary model


Is there any specifics / benchmark in terms of GPU Ram size wrt the vocabulary size ?

I saw in the tuto / guide that 40k or 50k is mostly retained as the vocab size basis.

Is there any experience with a much larger size ?

If not how do you handle oov beyond that limit ?

is it possible to rescore some kind of n-best list with an extra larger Language model ?



We haven’t done these yet, but I don’t think it should have too much impact particularly on the source side. Currently RAM is dominated by RNN size/batch size.

We mostly did that because there is only marginal benefit to being much larger on those tasks. You should be able to go to 200k.

(I’ll let other answers)

The decoder can do UNK replacement. It uses the attention to move words over from the source sentence. You can also specify a phrase table to do the replacement.

translate.lua supports n-best decoding in Moses format. We don’t include a reranking tool (goes against the NMT spirit), but you could use any standard LM reranker tool.