I’m trying to embed custom loss function for machine translation and stuck for a few days and decided to write here.
I tested my loss on toy-task and my own NMT.
I simply took NMTLossCompute from file Loss.py:
And wrote this modification (also note that «if» at the begining. ONMT sentences some times have dimention size equals 1 for sentence length and it contains only PAD or EOS)
Also I’m not sure about _make_shard_state() function. Didn’t get idea of sharding.
I’m using this https://github.com/deepmipt/expected_bleu/blob/master/modules/expectedMultiBleu.py as my loss function.
I started training from «warmed up» model and found that after 1 epoch accuracy on validation and BLEU on test(tested with multibleu.perl) didn’t change AT ALL.
This is how I launched training:
train.py -data data/IWSLD_prepro_wmixerprep -save_model demo-model -train_from demo-model_acc_61.42_ppl_8.40_e10.pt -learning_rate 0.1 -gpuid 0
I tried bunch of different learning rates