Custom loss. How to.

I’m trying to embed custom loss function for machine translation and stuck for a few days and decided to write here.
I tested my loss on toy-task and my own NMT.
I simply took NMTLossCompute from file
And wrote this modification (also note that «if» at the begining. ONMT sentences some times have dimention size equals 1 for sentence length and it contains only PAD or EOS)

Also I’m not sure about _make_shard_state() function. Didn’t get idea of sharding.
I’m using this as my loss function.
I started training from «warmed up» model and found that after 1 epoch accuracy on validation and BLEU on test(tested with multibleu.perl) didn’t change AT ALL.
This is how I launched training: -data data/IWSLD_prepro_wmixerprep -save_model demo-model -train_from -learning_rate 0.1 -gpuid 0
I tried bunch of different learning rates