Hello, I am trying to do translation from Enlish-Spanish with 35 Million sentences.
I created vocabulary of 32K using sentencepiece.
Fairseq Training:
Preprocessed ( converted to bin index ) using the steps here https://github.com/pytorch/fairseq/tree/master/examples/translation using the vocab from sentencepiece and trained using the command
fairseq-train --fp16 fairseq/data-bin/en-es/ --source-lang en --target-lang es --arch transformer_wmt_en_de --share-all-embeddings --criterion label_smoothed_cross_entropy --optimizer adam --adam-betas '(0.9, 0.98)' --warmup-updates 4000 --save-dir fairseq/checkpoints/en-es/ --no-progress-bar --log-interval 1000 --ddp-backend=no_c10d --clip-norm 0.0 --lr-scheduler inverse_sqrt --lr 0.0007 --label-smoothing 0.1 --max-tokens 4096 --update-freq 8
Using the transformer_base architecture. After 15 epochs I could get a BLEU score of 33.9 on the wmt13 test set
Opennmt :
I also tried to do the same here
-
Preprocessing with command -
onmt_preprocess -train_src ../data/parallel_data/training_data/processed_data/bpe/train.en -train_tgt ../data/parallel_data/training_data/processed_data/bpe/train.es -valid_src ../data/parallel_data/valid_data/bpe/valid.en -valid_tgt ../data/parallel_data/valid_data/bpe/valid.es -save_data data/en-es/ --num_threads 16 --src_vocab ../vocabulary/opennmt_vocab/sentencepiece_en-es.vocab --tgt_vocab ../vocabulary/opennmt_vocab/sentencepiece_en-es.vocab --src_vocab_size 32000 --tgt_vocab_size 32000 --share_vocab
-
I tried training with multiple commands
onmt_train -data data/en-es/ -save_model checkpoints/en-es/ -layers 6 -rnn_size 512 -word_vec_size 512 -share_decoder_embeddings -share_embeddings -transformer_ff 2048 -heads 8 -encoder_type transformer -decoder_type transformer -position_encoding -train_steps 75000 --max_generator_batches 2 -dropout 0.1 -batch_size 4096 -batch_type tokens -normalization tokens -accum_count 2 -optim adam -adam_beta2 0.998 -decay_method noam -warmup_steps 8000 -learning_rate 2 -max_grad_norm 0 -param_init 0 -param_init_glorot -label_smoothing 0.1 -valid_steps 5000 -save_checkpoint_steps 5000 -world_size 8 -gpu_ranks 0 1 2 3 4 5 6 7 -log_file log/en-es/en_en.log -exp transformer_base_en_es -report_every 1000
which gave a BLEU of 30.9 ( followed https://opennmt.net/OpenNMT-py/FAQ.html#how-do-i-use-the-transformer-model)
and changed warmup steps to 4000 and accum_count to 8 and got a BLEU of 31.2
These BLEU scores seem lesser than fairseq and all these are done with the best checkpoint in opennmt ( one with least validation perplexity ). All these trainings have been done on 8 GPUs
I observed a similar pattern for a different task as well ( Opennmt-Py scoring a bit lower compared to fairseq) . Is there any reason why it is lesser than fairseq or am I missing something ?
Thanks in advance.