Issues when running the English-German WMT15 training

personally, I do:
learn_bpe.lua -lc …
tokenize -case_feature -joiner_annotate …
similar to your second option.

I’ve just trained a model with exactly that configuration: 5M segments with a 4x1000 network. Although the improvement in BLEU was not dramatic I have noticed that a lot of small “annoying issues”, particularly regarding number entities, have now been solved. Yes, it’s all about experimenting.

Can be interesting for you.
https://arxiv.org/abs/1703.03906