personally, I do:
learn_bpe.lua -lc …
tokenize -case_feature -joiner_annotate …
similar to your second option.
I’ve just trained a model with exactly that configuration: 5M segments with a 4x1000 network. Although the improvement in BLEU was not dramatic I have noticed that a lot of small “annoying issues”, particularly regarding number entities, have now been solved. Yes, it’s all about experimenting.