Hi
I have trained a Spanish-English NMT system with OpenNMT-py transformer model using its default transformer’s hyperparameters parameters on 6 GPUs (for around 6 days)
The training dataset size is ~71 Million pairs that is preprocessed using Moses’s perl tokenizer and BPE-segmentation scripts in the OpenNMT-py package. I am testing on UN-testset and getting a BLEU score of 63.1
I used the same preprocessed dataset with OpenNMT-tf and using 6 GPUs
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 onmt-main train_and_eval --model_type Transformer --config config_train.yaml --auto_config --num_gpus 6
The options set in the config file(in addition to the training, validation and vocab file paths)are:
save_checkpoints_steps: 5000
keep_checkpoint_max: 10
save_summary_steps: 100
train_steps: 1000000
maximum_features_length: 100
maximum_labels_length: 100
num_threads: 8
I did not see any improvement in BLEU score over both the development set and the test set after the 3rd day. From iteration 225000 to 610000, no improvement in BLEU score on testset which was 43.9 (compared to 63.1 using OpenNMT-py)
Is there anything that I am missing with the use of OpenNMT-tf?
Thanks