I’ve been trying to train an English-Spanish model using the TED IWSLT 2016 dataset. My training and development sets consist of 218845 and 873 samples respectively. I set it to training for ~30 epochs with batch size 64 (102583 steps) with the following command:
python train.py -data data/$dataset -save_model models/$model -coverage_attn -word_vec_size 512 -layers 3 -rnn_size 512 -rnn_type LSTM -encoder_type brnn -batch_size 64 -dropout 0.25 -input_feed 1 -global_attention mlp -optim adam -learning_rate 0.0001 -gpu_ranks 0 -train_steps 102583
I’ve tokenized the dataset using BPE with 20k steps with joint vocabulary. OpenNMT reports the vocab sizes as 17138 for source (English) and 21763 as target (Spanish).
While training, validation accuracy increases from 81.297 to 83.38 and perplexity decreases from 1.81916 to 1.7422 at first and last epochs. Here’s the whole training log: https://pastebin.com/cUfkgsLh
On inference I can’t get any meaningful results. It is only predicting a few characters as in:
SENT 1: ['thank', 'you', 'so', 'much', ',', 'chris', '.'] PRED 1: l a PRED SCORE: -0.5345 SENT 2: ['i', 'have', 'been', 'blown', 'away', 'by', 'this', 'conference', '.'] PRED 2: l a PRED SCORE: -0.3953 SENT 3: ['i', 'flew', 'on', 'air', 'force', 'two', 'for', 'eight', 'years', '.'] PRED 3: l a PRED SCORE: -0.1203 SENT 4: ['now', 'i', 'have', 'to', 'take', 'off', 'my', 'shoes', 'or', 'boots', 'to', 'get', 'on', 'an', 'airplane', '!'] PRED 4: @ @ PRED SCORE: -0.2087 SENT 5: ['i', "'ll", 'tell', 'you', 'one', 'quick', 'story', 'to', 'illustrate', 'what', 'that', "'s", 'been', 'like', 'for', 'me', '.'] PRED 5: l a PRED SCORE: -0.2856
I’ve previously obtained OK-ish models using OpenNMT and I’ve also got it working fine with the models available on the website. However, I cannot get it running with the dataset that I have right now. I’d really appreciate your help. Thanks.