Loss not reducing in opennmt-tf

aquorio15 · March 29, 2022, 4:26am

I have been training a Transformer model on opennmt-tf, but the loss doesnot seem to reduce even after almost 316000 step, and also the BLEU score is not improving. I am using a dataset containing almost 3.5m training pairs

guillaumekln · March 29, 2022, 7:51am

Can you post the full training log?

aquorio15 · March 29, 2022, 8:02am

I am using google colab and i dont have any log files

guillaumekln · March 29, 2022, 10:10am

Can you provide more information then? The command line you used, the training and model configurations, the current loss and BLEU values, etc.

aquorio15 · March 29, 2022, 10:24am

Command line: !onmt-main --model_type Transformer --config data.yaml --auto_config train --with_eval
Training config:
train:
save_checkpoints_steps: 1000
maximum_features_length: 50
maximum_labels_length: 50
batch_size: 4096
eval:
external_evaluators: BLEU
params:
dropout: 0.3
average_loss_in_time: true
infer:
batch_size: 32
“”"
Model:I am using the transformer model from opennmt documentation
4.Current Loss: 2.26
Bleu score: 10.4
I am running a validation after every 5000 steps, but the model loss is not converging

guillaumekln · March 29, 2022, 10:26am

I suggest removing the dropout parameter from your configuration. The default dropout values should work well.

aquorio15 · March 29, 2022, 10:31am

ok i will try doing that

aquorio15 · March 29, 2022, 10:32am

One question though is that the reason why the loss is not converging?

guillaumekln · March 29, 2022, 11:40am

It’s one possible reason.