Hello,
can anyone share hyperparameter settings identical to Tensor2Tensor transformer_big_single_gpu hyperparameter set (ideally version 1.2.9)? I tried my best and OpenNMT-py is still 0.5 BLEU worse (and the dev BLEU learning curve has different and quite strange shape, btw.). Is is even possible without implementing new features? If not, I’m OK with current state.
Hello,
What options did you use so far?
python OpenNMT-py/preprocess.py \
-train_src $DATA_DIR/train.de \
-train_tgt $DATA_DIR/train.cs \
-valid_src $DATA_DIR/dev.de \
-valid_tgt $DATA_DIR/dev.cs \
-save_data $DATA_DIR/data \
-src_vocab_size 150000 \
-tgt_vocab_size 150000 \
-src_vocab $AVOCAB \
-max_shard_size 134217728 \
-src_seq_length 1500 \
-tgt_seq_length 1500 \
-share_vocab
LAYERS=6
RNNS=512
WVS=512
EPOCHS=40
MGB=32
BS=1500
GPU="-gpuid 0"
python OpenNMT-py/train.py \
-data $DATA_DIR/data \
-swap_every 0 \
-save_model $TRAIN_DIR/model \
-layers $LAYERS \
-rnn_size $RNNS \
-word_vec_size $WVS \
-encoder_type transformer \
-decoder_type transformer \
-position_encoding \
-epochs $EPOCHS \
-max_generator_batches 32 \
-dropout 0.1 \
-batch_size $BS \
-batch_type tokens -normalization tokens -accum_count 4 \
-optim adam -adam_beta2 0.998 \
-adam_beta1 0.9 \
-decay_method noam -warmup_steps 60000 -learning_rate 2 \
-max_grad_norm 0 -param_init 0 -param_init_glorot \
-label_smoothing 0.1 \
-exp $PROBLEM \
-tensorboard \
-tensorboard_log_dir $TRAIN_DIR $GPU
The data were preprocessed with T2T SubwordTextEncoder with 100k shared vocab, and transformed into indexes, decimal numbers. So the input data are identical in both.