Identical settings for OpenNMT-py and Tensor2Tensor

pytorch

(Dominik Macháček) #1

Hello,
can anyone share hyperparameter settings identical to Tensor2Tensor transformer_big_single_gpu hyperparameter set (ideally version 1.2.9)? I tried my best and OpenNMT-py is still 0.5 BLEU worse (and the dev BLEU learning curve has different and quite strange shape, btw.). Is is even possible without implementing new features? If not, I’m OK with current state.


(Guillaume Klein) #2

Hello,

What options did you use so far?


(Dominik Macháček) #3
    python OpenNMT-py/preprocess.py \
            -train_src $DATA_DIR/train.de \
            -train_tgt $DATA_DIR/train.cs \
            -valid_src $DATA_DIR/dev.de \
            -valid_tgt $DATA_DIR/dev.cs \
            -save_data $DATA_DIR/data \
            -src_vocab_size 150000 \
            -tgt_vocab_size 150000 \
            -src_vocab $AVOCAB \
            -max_shard_size 134217728 \
            -src_seq_length 1500 \
            -tgt_seq_length 1500 \
            -share_vocab


    LAYERS=6
    RNNS=512
    WVS=512
    EPOCHS=40
    MGB=32
    BS=1500
    GPU="-gpuid 0"


     python OpenNMT-py/train.py \
    -data $DATA_DIR/data \
    -swap_every 0 \
    -save_model $TRAIN_DIR/model \
    -layers $LAYERS \
    -rnn_size $RNNS \
    -word_vec_size $WVS   \
    -encoder_type transformer \
    -decoder_type transformer \
    -position_encoding \
    -epochs $EPOCHS  \
    -max_generator_batches 32 \
    -dropout 0.1 \
    -batch_size $BS \
    -batch_type tokens -normalization tokens  -accum_count 4 \
    -optim adam -adam_beta2 0.998 \
    -adam_beta1 0.9 \
    -decay_method noam -warmup_steps 60000 -learning_rate 2 \
    -max_grad_norm 0 -param_init 0  -param_init_glorot \
    -label_smoothing 0.1  \
    -exp $PROBLEM \
    -tensorboard \
    -tensorboard_log_dir $TRAIN_DIR $GPU

(Dominik Macháček) #4

The data were preprocessed with T2T SubwordTextEncoder with 100k shared vocab, and transformed into indexes, decimal numbers. So the input data are identical in both.