OpenNMT

Opennmt-tf default param

I’ve been using openNMT-py for a while and it’s pretty straight forward. I have been trying to use opennmt-tf for few days, but I can’t seem to get any decent results or any results at all. I have tried Transformer/ Adam optimiser and SDG all this with and without sentencePiece and nothing avail.

When i’m training I see the perplexity going down… (yet still really high … around 15 after 75000 steps with 128 batch size (exemples not tokens)…) and then when i try to infer i only get “unk” for every words.

I believe the problem is partially related to the fact i’m using sentencePiece. I have notice that in the loading of the training there are no embedding specified. This is something i have never define in opennmt-py… i’m not sure if i’m missing something here?

if anyone could point out what i’m doing wrong… it would be greatly appreciated.

here my params:

learning_rate = ‘0.0003’
batch_size = ‘128’
tokenizer_model_type = ‘bpe’
vocab_size = 20000
model_type = ‘NMTSmallV1’

sourceTrainFile = “gdrive/MyDrive/VGR/en-” + languageCode2 + “/src-train.txt”
targetTrainFile = “gdrive/MyDrive/VGR/en-” + languageCode2 + “/tgt-train.txt”

vocabPath = “gdrive/MyDrive/VGR/en-” + languageCode2 + “/tf/vocab/”
sourceVocabPath = vocabPath + ‘SourceSP’
targetVocabPath = vocabPath + ‘TargetSP’
import os

if not os.path.isdir(vocabPath):
os.makedirs(vocabPath)
!onmt-build-vocab --sentencepiece model_type=$tokenizer_model_type --size $vocab_size --save_vocab $sourceVocabPath $sourceTrainFile
!onmt-build-vocab --sentencepiece model_type=$tokenizer_model_type --size $vocab_size --save_vocab $targetVocabPath $targetTrainFile

Create the YAML configuration file (Basic)

config = ‘’’# en-’’’ + languageCode2 + ‘’’.yaml

Where the model will be saved

model_dir: gdrive/MyDrive/VGR/en-’’’ + languageCode2 + ‘’’/model/tf/’’’ + modelName + ‘’’

data:
train_features_file: gdrive/MyDrive/VGR/en-’’’ + languageCode2 + ‘’’/src-train.txt
train_labels_file: gdrive/MyDrive/VGR/en-’’’ + languageCode2 + ‘’’/tgt-train.txt
eval_features_file: gdrive/MyDrive/VGR/en-’’’ + languageCode2 + ‘’’/src-val.txt
eval_labels_file: gdrive/MyDrive/VGR/en-’’’ + languageCode2 + ‘’’/tgt-val.txt

## Where the vocab(s)
source_vocabulary: ''' + sourceVocabPath + '''.vocab
target_vocabulary: ''' + targetVocabPath + '''.vocab

params:
learning_rate: ‘’’ + learning_rate + ‘’’
minimum_learning_rate: 0.0001
beam_width: 5
optimizer: SGD
optimizer_params:
clipnorm: 1

train:
save_checkpoints_steps: ‘’’ + str(save_checkpoint_steps) + ‘’’
# (optional) How many checkpoints to keep on disk.
keep_checkpoint_max: 10
batch_size: ‘’’ + batch_size + ‘’’
effective_batch_size: 1

eval:
steps: 2000
# Available scorers: bleu, rouge, wer, ter, prf
#scorers: bleu
#export_on_best: bleu
export_format: saved_model

infer:
n_best: 3
with_scores: true ‘’’

with open(“en-” + languageCode2 + “.yaml”, “w+”) as config_yaml:
config_yaml.write(config)

!cat “en-”$languageCode2".yaml"

and at last the training line:

!onmt-main --model_type $model_type --config “en-”$languageCode2".yaml" --auto_config train --with_eval

and if it can be of any help… this was my last try by tweaking some parameters… (I tried to use as mutch as possible the default params) and in this run i get INF as perplexity which is worse than my previous tries.

2021-05-28 13:45:16.959955: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-05-28 13:45:18.624459: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-05-28 13:45:18.625247: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-05-28 13:45:18.638273: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-28 13:45:18.638895: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:00:04.0 name: Tesla P100-PCIE-16GB computeCapability: 6.0
coreClock: 1.3285GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-05-28 13:45:18.638930: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-05-28 13:45:18.641610: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-05-28 13:45:18.641677: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-05-28 13:45:18.643239: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-05-28 13:45:18.643570: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-05-28 13:45:18.645293: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-05-28 13:45:18.645883: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-05-28 13:45:18.646048: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-05-28 13:45:18.646141: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-28 13:45:18.646721: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-28 13:45:18.647256: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-05-28 13:45:18.689000: I main.py:318] Using OpenNMT-tf version 2.18.1
2021-05-28 13:45:18.689000: I main.py:318] Using model:
(model): NMTSmallV1(
(examples_inputter): SequenceToSequenceInputter(
(features_inputter): WordEmbedder()
(labels_inputter): WordEmbedder()
(inputters): ListWrapper(
(0): WordEmbedder()
(1): WordEmbedder()
)
)
(encoder): RNNEncoder(
(rnn): RNN(
(rnn): RNN(
(cell): StackedRNNCells(
(cells): ListWrapper(
(0): RNNCellWrapper(
(layer): LSTMCell(512)
(cell): LSTMCell(512)
)
(1): RNNCellWrapper(
(layer): LSTMCell(512)
(cell): LSTMCell(512)
)
)
)
)
(reducer): ConcatReducer()
)
)
(decoder): AttentionalRNNDecoder(
(bridge): CopyBridge()
(attention_mechanism): LuongAttention(
(memory_layer): Dense(512)
)
(cell): AttentionWrapper()
)
)

2021-05-28 13:45:18.689959: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-05-28 13:45:18.690150: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-05-28 13:45:18.690316: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-28 13:45:18.690927: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:00:04.0 name: Tesla P100-PCIE-16GB computeCapability: 6.0
coreClock: 1.3285GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2021-05-28 13:45:18.690966: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-05-28 13:45:18.691015: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-05-28 13:45:18.691035: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-05-28 13:45:18.691054: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-05-28 13:45:18.691072: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-05-28 13:45:18.691090: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-05-28 13:45:18.691107: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-05-28 13:45:18.691125: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-05-28 13:45:18.691199: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-28 13:45:18.691756: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-28 13:45:18.692274: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-05-28 13:45:18.692320: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-05-28 13:45:19.368493: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-05-28 13:45:19.368544: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
2021-05-28 13:45:19.368556: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
2021-05-28 13:45:19.368748: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-28 13:45:19.369399: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-28 13:45:19.369969: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-28 13:45:19.370469: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2021-05-28 13:45:19.370516: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14957 MB memory) → physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0)
2021-05-28 13:45:19.374000: I main.py:326] Using parameters:
data:
eval_features_file: gdrive/MyDrive/VGR/en-fr/src-val.txt
eval_labels_file: gdrive/MyDrive/VGR/en-fr/tgt-val.txt
source_vocabulary: gdrive/MyDrive/VGR/en-fr/tf/vocab/SourceSP.vocab
target_vocabulary: gdrive/MyDrive/VGR/en-fr/tf/vocab/TargetSP.vocab
train_features_file: gdrive/MyDrive/VGR/en-fr/src-train.txt
train_labels_file: gdrive/MyDrive/VGR/en-fr/tgt-train.txt
eval:
batch_size: 32
batch_type: examples
export_format: saved_model
length_bucket_width: 5
steps: 2000
infer:
batch_size: 32
batch_type: examples
length_bucket_width: 5
n_best: 3
with_scores: true
model_dir: gdrive/MyDrive/VGR/en-fr/model/tf/OpenNMT-TF_fr_mtNMTSmallV1_tmtbpe_vs20000_lr0.0003_bs64
params:
average_loss_in_time: false
beam_width: 5
learning_rate: 0.0002
minimum_learning_rate: 0.0001
num_hypotheses: 3
optimizer: Adam
optimizer_params:
beta_1: 0.8
beta_2: 0.998
score:
batch_size: 64
batch_type: examples
length_bucket_width: 5
train:
batch_size: 64
batch_type: examples
effective_batch_size: 1
keep_checkpoint_max: 10
length_bucket_width: 1
max_step: 500000
maximum_features_length: 80
maximum_labels_length: 80
sample_buffer_size: -1
save_checkpoints_steps: 11000
save_summary_steps: 100

2021-05-28 13:45:19.568000: I runner.py:242] Restored checkpoint gdrive/MyDrive/VGR/en-fr/model/tf/OpenNMT-TF_fr_mtNMTSmallV1_tmtbpe_vs20000_lr0.0003_bs64/ckpt-1
2021-05-28 13:45:19.573000: W deprecation.py:339] From /usr/local/lib/python3.7/dist-packages/tensorflow/python/summary/summary_iterator.py:31: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
tf.data.TFRecordDataset(path)
2021-05-28 13:45:21.000000: I main.py:326] Accumulate gradients of 1 iterations to reach effective batch size of 1
2021-05-28 13:45:21.051000: I dataset_ops.py:1996] Training on 200980 examples
2021-05-28 13:45:21.861929: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-05-28 13:45:21.863488: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2299995000 Hz
2021-05-28 13:45:43.440000: I control_flow.py:1218] Number of model parameters: 40973345
2021-05-28 13:45:44.237000: I control_flow.py:1218] Number of model weights: 18 (trainable = 18, non trainable = 0)
2021-05-28 13:46:02.804032: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-05-28 13:46:03.977579: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-05-28 13:46:23.289000: I runner.py:281] Step = 100 ; steps/s = 5.62, source words/s = 6808, target words/s = 8035 ; Learning rate = 0.000200 ; Loss = 1266.827637
2021-05-28 13:46:40.876000: I runner.py:281] Step = 200 ; steps/s = 5.69, source words/s = 6912, target words/s = 8142 ; Learning rate = 0.000200 ; Loss = 1018.049866
2021-05-28 13:46:58.843000: I runner.py:281] Step = 300 ; steps/s = 5.57, source words/s = 6830, target words/s = 8138 ; Learning rate = 0.000200 ; Loss = 1051.877197
2021-05-28 13:47:16.394000: I runner.py:281] Step = 400 ; steps/s = 5.70, source words/s = 6912, target words/s = 8123 ; Learning rate = 0.000200 ; Loss = 1035.163452
2021-05-28 13:47:34.256000: I runner.py:281] Step = 500 ; steps/s = 5.60, source words/s = 6834, target words/s = 8108 ; Learning rate = 0.000200 ; Loss = 910.203552
2021-05-28 13:47:51.854000: I runner.py:281] Step = 600 ; steps/s = 5.68, source words/s = 6953, target words/s = 8187 ; Learning rate = 0.000200 ; Loss = 1179.720825
2021-05-28 13:48:09.652000: I runner.py:281] Step = 700 ; steps/s = 5.62, source words/s = 6882, target words/s = 8143 ; Learning rate = 0.000200 ; Loss = 997.235657
2021-05-28 13:48:27.638000: I runner.py:281] Step = 800 ; steps/s = 5.56, source words/s = 6860, target words/s = 8135 ; Learning rate = 0.000200 ; Loss = 884.168701
2021-05-28 13:48:45.839000: I runner.py:281] Step = 900 ; steps/s = 5.50, source words/s = 6827, target words/s = 8123 ; Learning rate = 0.000200 ; Loss = 1198.313477
2021-05-28 13:49:02.966000: I runner.py:281] Step = 1000 ; steps/s = 5.84, source words/s = 6942, target words/s = 8144 ; Learning rate = 0.000200 ; Loss = 1476.122803
2021-05-28 13:49:21.061000: I runner.py:281] Step = 1100 ; steps/s = 5.53, source words/s = 6860, target words/s = 8109 ; Learning rate = 0.000200 ; Loss = 1158.275757
2021-05-28 13:49:39.017000: I runner.py:281] Step = 1200 ; steps/s = 5.57, source words/s = 6806, target words/s = 8070 ; Learning rate = 0.000200 ; Loss = 1226.175171
2021-05-28 13:49:56.660000: I runner.py:281] Step = 1300 ; steps/s = 5.67, source words/s = 6877, target words/s = 8115 ; Learning rate = 0.000200 ; Loss = 1260.031006
2021-05-28 13:50:14.503000: I runner.py:281] Step = 1400 ; steps/s = 5.61, source words/s = 6773, target words/s = 8005 ; Learning rate = 0.000200 ; Loss = 890.193542
2021-05-28 13:50:32.387000: I runner.py:281] Step = 1500 ; steps/s = 5.59, source words/s = 6852, target words/s = 8144 ; Learning rate = 0.000200 ; Loss = 811.623596
2021-05-28 13:50:50.466000: I runner.py:281] Step = 1600 ; steps/s = 5.53, source words/s = 6818, target words/s = 8061 ; Learning rate = 0.000200 ; Loss = 1026.326904
2021-05-28 13:51:07.724000: I runner.py:281] Step = 1700 ; steps/s = 5.80, source words/s = 6953, target words/s = 8159 ; Learning rate = 0.000200 ; Loss = 715.587097
2021-05-28 13:51:25.901000: I runner.py:281] Step = 1800 ; steps/s = 5.50, source words/s = 6827, target words/s = 8106 ; Learning rate = 0.000200 ; Loss = 952.115234
2021-05-28 13:51:43.502000: I runner.py:281] Step = 1900 ; steps/s = 5.68, source words/s = 6910, target words/s = 8171 ; Learning rate = 0.000200 ; Loss = 926.592529
2021-05-28 13:52:01.011000: I runner.py:281] Step = 2000 ; steps/s = 5.71, source words/s = 6859, target words/s = 8098 ; Learning rate = 0.000200 ; Loss = 791.998291
2021-05-28 13:52:01.013000: I training.py:202] Running evaluation for step 2000
2021-05-28 13:57:09.319000: I training.py:202] Evaluation result for step 2000: loss = 514.641296 ; perplexity = inf
2021-05-28 13:57:26.899000: I runner.py:281] Step = 2100 ; steps/s = 5.69, source words/s = 6885, target words/s = 8091 ; Learning rate = 0.000200 ; Loss = 839.368347
2021-05-28 13:57:44.811000: I runner.py:281] Step = 2200 ; steps/s = 5.58, source words/s = 6889, target words/s = 8177 ; Learning rate = 0.000200 ; Loss = 908.090759
2021-05-28 13:58:02.761000: I runner.py:281] Step = 2300 ; steps/s = 5.57, source words/s = 6878, target words/s = 8155 ; Learning rate = 0.000200 ; Loss = 1177.083618
2021-05-28 13:58:20.328000: I runner.py:281] Step = 2400 ; steps/s = 5.69, source words/s = 6858, target words/s = 8075 ; Learning rate = 0.000200 ; Loss = 1329.044312
2021-05-28 13:58:38.186000: I runner.py:281] Step = 2500 ; steps/s = 5.60, source words/s = 6852, target words/s = 8128 ; Learning rate = 0.000200 ; Loss = 1493.157349
2021-05-28 13:58:55.756000: I runner.py:281] Step = 2600 ; steps/s = 5.69, source words/s = 6886, target words/s = 8121 ; Learning rate = 0.000200 ; Loss = 1473.173584

In general I would recommend to follow the quickstart:

https://opennmt.net/OpenNMT-tf/quickstart.html

You could just change the data configuration with your own training data and vocabulary and then run the training with --auto_config. You should get good results this way.

Also make sure the vocabulary files have the correct format:

https://opennmt.net/OpenNMT-tf/vocabulary.html

Thank you @guillaumekln,

That’s exacly what I did orginaly, but I still didn’t get any decent results. I tried again last night, with only these params for train/param in my yaml file:

train:
save_checkpoints_steps: 2000
keep_checkpoint_max: 10

and my training line:

!onmt-main --model_type Transformer --config “en-fr.yaml” --auto_config train --with_eval

my vocab looks like this: (with sentencePiece, just like in the quickstart explanation)

▁, ▁d es ▁l en ou ▁a ▁p ▁. ▁c ai qu on ▁s ▁qu re ▁v an ▁de er ▁m ...

Here are the results:

The perplexity sky rocked after 6000 steps. I’m training on 200k sentences which yields really good results in opennmt-py. There was only 20k steps done in (6 hours), but for the equivalent time in Opennmt-py I would have a really decent model with about 60 to 70 Bleu score.

I will try the smallNMT model tonight… but so far Transformer never workedout out of the box for me in opennmt-tf.

This is the result of infer:

-2250.230957 ||| ré ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁adoraient ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁adoraient ▁voyiez ▁adoraient ▁voyiez ▁adoraient ▁voyiez ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁voyiez ▁adoraient ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁voyiez ▁voyiez ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁adoraient ▁adoraient ▁adoraient ▁voyiez ▁adoraient ▁adoraient ▁voyiez ▁adoraient ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁adoraient ▁adoraient ▁adoraient ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁voyiez ▁adoraient ▁voyiez ▁adoraient ▁voyiez ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁voyiez ▁adoraient ▁voyiez ▁adoraient ▁voyiez ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient ▁adoraient

here is my log:

2021-05-29 04:37:06.096194: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-05-29 04:37:07.942853: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-05-29 04:37:07.943748: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-05-29 04:37:08.011891: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-29 04:37:08.012555: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:00:04.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-05-29 04:37:08.012593: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-05-29 04:37:08.170022: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-05-29 04:37:08.170131: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-05-29 04:37:08.333822: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-05-29 04:37:08.373344: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-05-29 04:37:08.587261: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-05-29 04:37:08.669135: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-05-29 04:37:08.678098: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-05-29 04:37:08.678297: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-29 04:37:08.678994: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-29 04:37:08.679634: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-05-29 04:37:08.684000: I onmt-main:8] Creating model directory gdrive/MyDrive/VGR/en-fr/model/tf/OpenNMT-TF_fr_mtTransformer_tmtbpe_vs20000_lr0.0003_bs64
2021-05-29 04:37:08.833000: I main.py:318] Using OpenNMT-tf version 2.18.1
2021-05-29 04:37:08.833000: I main.py:318] Using model:
(model): TransformerBase(
(examples_inputter): SequenceToSequenceInputter(
(features_inputter): WordEmbedder()
(labels_inputter): WordEmbedder()
(inputters): ListWrapper(
(0): WordEmbedder()
(1): WordEmbedder()
)
)
(encoder): SelfAttentionEncoder(
(position_encoder): SinusoidalPositionEncoder(
(reducer): SumReducer()
)
(layer_norm): LayerNorm()
(layers): ListWrapper(
(0): SelfAttentionEncoderLayer(
(self_attention): TransformerLayerWrapper(
(layer): MultiHeadAttention(
(linear_queries): Dense(512)
(linear_keys): Dense(512)
(linear_values): Dense(512)
(linear_output): Dense(512)
)
(input_layer_norm): LayerNorm()
)
(ffn): TransformerLayerWrapper(
(layer): FeedForwardNetwork(
(inner): Dense(2048)
(outer): Dense(512)
)
(input_layer_norm): LayerNorm()
)
)
(1): SelfAttentionEncoderLayer(
(self_attention): TransformerLayerWrapper(
(layer): MultiHeadAttention(
(linear_queries): Dense(512)
(linear_keys): Dense(512)
(linear_values): Dense(512)
(linear_output): Dense(512)
)
(input_layer_norm): LayerNorm()
)
(ffn): TransformerLayerWrapper(
(layer): FeedForwardNetwork(
(inner): Dense(2048)
(outer): Dense(512)
)
(input_layer_norm): LayerNorm()
)
)
(2): SelfAttentionEncoderLayer(
(self_attention): TransformerLayerWrapper(
(layer): MultiHeadAttention(
(linear_queries): Dense(512)
(linear_keys): Dense(512)
(linear_values): Dense(512)
(linear_output): Dense(512)
)
(input_layer_norm): LayerNorm()
)
(ffn): TransformerLayerWrapper(
(layer): FeedForwardNetwork(
(inner): Dense(2048)
(outer): Dense(512)
)
(input_layer_norm): LayerNorm()
)
)
(3): SelfAttentionEncoderLayer(
(self_attention): TransformerLayerWrapper(
(layer): MultiHeadAttention(
(linear_queries): Dense(512)
(linear_keys): Dense(512)
(linear_values): Dense(512)
(linear_output): Dense(512)
)
(input_layer_norm): LayerNorm()
)
(ffn): TransformerLayerWrapper(
(layer): FeedForwardNetwork(
(inner): Dense(2048)
(outer): Dense(512)
)
(input_layer_norm): LayerNorm()
)
)
(4): SelfAttentionEncoderLayer(
(self_attention): TransformerLayerWrapper(
(layer): MultiHeadAttention(
(linear_queries): Dense(512)
(linear_keys): Dense(512)
(linear_values): Dense(512)
(linear_output): Dense(512)
)
(input_layer_norm): LayerNorm()
)
(ffn): TransformerLayerWrapper(
(layer): FeedForwardNetwork(
(inner): Dense(2048)
(outer): Dense(512)
)
(input_layer_norm): LayerNorm()
)
)
(5): SelfAttentionEncoderLayer(
(self_attention): TransformerLayerWrapper(
(layer): MultiHeadAttention(
(linear_queries): Dense(512)
(linear_keys): Dense(512)
(linear_values): Dense(512)
(linear_output): Dense(512)
)
(input_layer_norm): LayerNorm()
)
(ffn): TransformerLayerWrapper(
(layer): FeedForwardNetwork(
(inner): Dense(2048)
(outer): Dense(512)
)
(input_layer_norm): LayerNorm()
)
)
)
)
(decoder): SelfAttentionDecoder(
(position_encoder): SinusoidalPositionEncoder(
(reducer): SumReducer()
)
(layer_norm): LayerNorm()
(layers): ListWrapper(
(0): SelfAttentionDecoderLayer(
(self_attention): TransformerLayerWrapper(
(layer): MultiHeadAttention(
(linear_queries): Dense(512)
(linear_keys): Dense(512)
(linear_values): Dense(512)
(linear_output): Dense(512)
)
(input_layer_norm): LayerNorm()
)
(attention): ListWrapper(
(0): TransformerLayerWrapper(
(layer): MultiHeadAttention(
(linear_queries): Dense(512)
(linear_keys): Dense(512)
(linear_values): Dense(512)
(linear_output): Dense(512)
)
(input_layer_norm): LayerNorm()
)
)
(ffn): TransformerLayerWrapper(
(layer): FeedForwardNetwork(
(inner): Dense(2048)
(outer): Dense(512)
)
(input_layer_norm): LayerNorm()
)
)
(1): SelfAttentionDecoderLayer(
(self_attention): TransformerLayerWrapper(
(layer): MultiHeadAttention(
(linear_queries): Dense(512)
(linear_keys): Dense(512)
(linear_values): Dense(512)
(linear_output): Dense(512)
)
(input_layer_norm): LayerNorm()
)
(attention): ListWrapper(
(0): TransformerLayerWrapper(
(layer): MultiHeadAttention(
(linear_queries): Dense(512)
(linear_keys): Dense(512)
(linear_values): Dense(512)
(linear_output): Dense(512)
)
(input_layer_norm): LayerNorm()
)
)
(ffn): TransformerLayerWrapper(
(layer): FeedForwardNetwork(
(inner): Dense(2048)
(outer): Dense(512)
)
(input_layer_norm): LayerNorm()
)
)
(2): SelfAttentionDecoderLayer(
(self_attention): TransformerLayerWrapper(
(layer): MultiHeadAttention(
(linear_queries): Dense(512)
(linear_keys): Dense(512)
(linear_values): Dense(512)
(linear_output): Dense(512)
)
(input_layer_norm): LayerNorm()
)
(attention): ListWrapper(
(0): TransformerLayerWrapper(
(layer): MultiHeadAttention(
(linear_queries): Dense(512)
(linear_keys): Dense(512)
(linear_values): Dense(512)
(linear_output): Dense(512)
)
(input_layer_norm): LayerNorm()
)
)
(ffn): TransformerLayerWrapper(
(layer): FeedForwardNetwork(
(inner): Dense(2048)
(outer): Dense(512)
)
(input_layer_norm): LayerNorm()
)
)
(3): SelfAttentionDecoderLayer(
(self_attention): TransformerLayerWrapper(
(layer): MultiHeadAttention(
(linear_queries): Dense(512)
(linear_keys): Dense(512)
(linear_values): Dense(512)
(linear_output): Dense(512)
)
(input_layer_norm): LayerNorm()
)
(attention): ListWrapper(
(0): TransformerLayerWrapper(
(layer): MultiHeadAttention(
(linear_queries): Dense(512)
(linear_keys): Dense(512)
(linear_values): Dense(512)
(linear_output): Dense(512)
)
(input_layer_norm): LayerNorm()
)
)
(ffn): TransformerLayerWrapper(
(layer): FeedForwardNetwork(
(inner): Dense(2048)
(outer): Dense(512)
)
(input_layer_norm): LayerNorm()
)
)
(4): SelfAttentionDecoderLayer(
(self_attention): TransformerLayerWrapper(
(layer): MultiHeadAttention(
(linear_queries): Dense(512)
(linear_keys): Dense(512)
(linear_values): Dense(512)
(linear_output): Dense(512)
)
(input_layer_norm): LayerNorm()
)
(attention): ListWrapper(
(0): TransformerLayerWrapper(
(layer): MultiHeadAttention(
(linear_queries): Dense(512)
(linear_keys): Dense(512)
(linear_values): Dense(512)
(linear_output): Dense(512)
)
(input_layer_norm): LayerNorm()
)
)
(ffn): TransformerLayerWrapper(
(layer): FeedForwardNetwork(
(inner): Dense(2048)
(outer): Dense(512)
)
(input_layer_norm): LayerNorm()
)
)
(5): SelfAttentionDecoderLayer(
(self_attention): TransformerLayerWrapper(
(layer): MultiHeadAttention(
(linear_queries): Dense(512)
(linear_keys): Dense(512)
(linear_values): Dense(512)
(linear_output): Dense(512)
)
(input_layer_norm): LayerNorm()
)
(attention): ListWrapper(
(0): TransformerLayerWrapper(
(layer): MultiHeadAttention(
(linear_queries): Dense(512)
(linear_keys): Dense(512)
(linear_values): Dense(512)
(linear_output): Dense(512)
)
(input_layer_norm): LayerNorm()
)
)
(ffn): TransformerLayerWrapper(
(layer): FeedForwardNetwork(
(inner): Dense(2048)
(outer): Dense(512)
)
(input_layer_norm): LayerNorm()
)
)
)
)
)

2021-05-29 04:37:08.836873: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-05-29 04:37:08.837111: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-05-29 04:37:08.837296: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-29 04:37:08.837941: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:00:04.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-05-29 04:37:08.837977: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-05-29 04:37:08.838032: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-05-29 04:37:08.838069: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-05-29 04:37:08.838090: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-05-29 04:37:08.838109: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-05-29 04:37:08.838122: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-05-29 04:37:08.838135: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-05-29 04:37:08.838149: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-05-29 04:37:08.838215: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-29 04:37:08.838809: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-29 04:37:08.839351: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-05-29 04:37:08.839391: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-05-29 04:37:09.496915: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-05-29 04:37:09.496967: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
2021-05-29 04:37:09.496977: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
2021-05-29 04:37:09.497200: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-29 04:37:09.497815: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-29 04:37:09.498409: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-29 04:37:09.498940: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2021-05-29 04:37:09.498981: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14760 MB memory) → physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:04.0, compute capability: 7.0)
2021-05-29 04:37:09.503000: I main.py:326] Using parameters:
data:
eval_features_file: gdrive/MyDrive/VGR/en-fr/src-val.txt
eval_labels_file: gdrive/MyDrive/VGR/en-fr/tgt-val.txt
source_vocabulary: gdrive/MyDrive/VGR/en-fr/tf/vocab/SourceSP.vocab
target_vocabulary: gdrive/MyDrive/VGR/en-fr/tf/vocab/TargetSP.vocab
train_features_file: gdrive/MyDrive/VGR/en-fr/src-train.txt
train_labels_file: gdrive/MyDrive/VGR/en-fr/tgt-train.txt
eval:
batch_size: 32
batch_type: examples
length_bucket_width: 5
steps: 2000
infer:
batch_size: 32
batch_type: examples
length_bucket_width: 5
n_best: 3
with_scores: true
model_dir: gdrive/MyDrive/VGR/en-fr/model/tf/OpenNMT-TF_fr_mtTransformer_tmtbpe_vs20000_lr0.0003_bs64
params:
average_loss_in_time: true
beam_width: 4
decay_params:
model_dim: 512
warmup_steps: 8000
decay_type: NoamDecay
label_smoothing: 0.1
learning_rate: 2.0
num_hypotheses: 3
optimizer: LazyAdam
optimizer_params:
beta_1: 0.9
beta_2: 0.998
score:
batch_size: 64
batch_type: examples
length_bucket_width: 5
train:
average_last_checkpoints: 8
batch_size: 3072
batch_type: tokens
effective_batch_size: 25000
keep_checkpoint_max: 10
length_bucket_width: 1
max_step: 500000
maximum_features_length: 100
maximum_labels_length: 100
sample_buffer_size: -1
save_checkpoints_steps: 2000
save_summary_steps: 100

2021-05-29 04:37:09.882000: W runner.py:242] No checkpoint to restore in gdrive/MyDrive/VGR/en-fr/model/tf/OpenNMT-TF_fr_mtTransformer_tmtbpe_vs20000_lr0.0003_bs64
2021-05-29 04:37:09.890000: W deprecation.py:339] From /usr/local/lib/python3.7/dist-packages/tensorflow/python/summary/summary_iterator.py:31: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
tf.data.TFRecordDataset(path)
2021-05-29 04:37:11.294000: I main.py:326] Accumulate gradients of 9 iterations to reach effective batch size of 25000
2021-05-29 04:37:11.346000: I dataset_ops.py:1996] Training on 200980 examples
2021-05-29 04:37:12.193365: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-05-29 04:37:12.195501: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2000175000 Hz
2021-05-29 04:37:27.261000: I control_flow.py:1218] Number of model parameters: 74882081
2021-05-29 04:37:28.001000: I control_flow.py:1218] Number of model weights: 260 (trainable = 260, non trainable = 0)
2021-05-29 04:37:46.228423: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-05-29 04:37:47.677950: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-05-29 04:37:47.733644: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-05-29 04:38:11.721000: I training.py:186] Saved checkpoint gdrive/MyDrive/VGR/en-fr/model/tf/OpenNMT-TF_fr_mtTransformer_tmtbpe_vs20000_lr0.0003_bs64/ckpt-1
2021-05-29 04:40:07.558000: I runner.py:281] Step = 100 ; steps/s = 0.86, source words/s = 19796, target words/s = 23403 ; Learning rate = 0.000012 ; Loss = 7.623491
2021-05-29 04:42:01.436000: I runner.py:281] Step = 200 ; steps/s = 0.88, source words/s = 19533, target words/s = 23101 ; Learning rate = 0.000025 ; Loss = 5.990872
2021-05-29 04:43:58.076000: I runner.py:281] Step = 300 ; steps/s = 0.86, source words/s = 19817, target words/s = 23442 ; Learning rate = 0.000037 ; Loss = 4.356587
2021-05-29 04:45:51.813000: I runner.py:281] Step = 400 ; steps/s = 0.88, source words/s = 19560, target words/s = 23129 ; Learning rate = 0.000050 ; Loss = 3.844167
2021-05-29 04:47:48.628000: I runner.py:281] Step = 500 ; steps/s = 0.86, source words/s = 19798, target words/s = 23406 ; Learning rate = 0.000062 ; Loss = 3.668841
2021-05-29 04:49:42.566000: I runner.py:281] Step = 600 ; steps/s = 0.88, source words/s = 19515, target words/s = 23089 ; Learning rate = 0.000074 ; Loss = 3.545279
2021-05-29 04:51:36.424000: I runner.py:281] Step = 700 ; steps/s = 0.88, source words/s = 19531, target words/s = 23104 ; Learning rate = 0.000087 ; Loss = 3.427966
2021-05-29 04:53:33.353000: I runner.py:281] Step = 800 ; steps/s = 0.86, source words/s = 19778, target words/s = 23385 ; Learning rate = 0.000099 ; Loss = 3.278536
2021-05-29 04:55:27.020000: I runner.py:281] Step = 900 ; steps/s = 0.88, source words/s = 19562, target words/s = 23143 ; Learning rate = 0.000111 ; Loss = 3.117330
2021-05-29 04:57:24.039000: I runner.py:281] Step = 1000 ; steps/s = 0.85, source words/s = 19764, target words/s = 23366 ; Learning rate = 0.000124 ; Loss = 3.013324
2021-05-29 04:59:17.479000: I runner.py:281] Step = 1100 ; steps/s = 0.88, source words/s = 19605, target words/s = 23191 ; Learning rate = 0.000136 ; Loss = 2.950432
2021-05-29 05:01:11.052000: I runner.py:281] Step = 1200 ; steps/s = 0.88, source words/s = 19576, target words/s = 23160 ; Learning rate = 0.000148 ; Loss = 2.912643
2021-05-29 05:03:08.021000: I runner.py:281] Step = 1300 ; steps/s = 0.85, source words/s = 19769, target words/s = 23376 ; Learning rate = 0.000161 ; Loss = 2.814301
2021-05-29 05:05:01.541000: I runner.py:281] Step = 1400 ; steps/s = 0.88, source words/s = 19597, target words/s = 23175 ; Learning rate = 0.000173 ; Loss = 2.786859
2021-05-29 05:06:58.614000: I runner.py:281] Step = 1500 ; steps/s = 0.85, source words/s = 19758, target words/s = 23356 ; Learning rate = 0.000185 ; Loss = 2.701603
2021-05-29 05:08:52.069000: I runner.py:281] Step = 1600 ; steps/s = 0.88, source words/s = 19602, target words/s = 23185 ; Learning rate = 0.000198 ; Loss = 2.689895
2021-05-29 05:10:45.455000: I runner.py:281] Step = 1700 ; steps/s = 0.88, source words/s = 19599, target words/s = 23201 ; Learning rate = 0.000210 ; Loss = 2.796372
2021-05-29 05:12:42.423000: I runner.py:281] Step = 1800 ; steps/s = 0.85, source words/s = 19769, target words/s = 23377 ; Learning rate = 0.000222 ; Loss = 2.669317
2021-05-29 05:14:35.702000: I runner.py:281] Step = 1900 ; steps/s = 0.88, source words/s = 19638, target words/s = 23221 ; Learning rate = 0.000235 ; Loss = 2.646419
2021-05-29 05:16:32.737000: I runner.py:281] Step = 2000 ; steps/s = 0.85, source words/s = 19757, target words/s = 23363 ; Learning rate = 0.000247 ; Loss = 2.581818
2021-05-29 05:16:35.757000: I training.py:186] Saved checkpoint gdrive/MyDrive/VGR/en-fr/model/tf/OpenNMT-TF_fr_mtTransformer_tmtbpe_vs20000_lr0.0003_bs64/ckpt-2000
2021-05-29 05:16:35.757000: I training.py:202] Running evaluation for step 2000
2021-05-29 05:18:00.512000: I training.py:202] Evaluation result for step 2000: loss = 1.473726 ; perplexity = 4.365469
2021-05-29 05:19:53.792000: I runner.py:281] Step = 2100 ; steps/s = 0.88, source words/s = 19634, target words/s = 23225 ; Learning rate = 0.000260 ; Loss = 2.603908
2021-05-29 05:21:50.015000: I runner.py:281] Step = 2200 ; steps/s = 0.86, source words/s = 19786, target words/s = 23395 ; Learning rate = 0.000272 ; Loss = 2.301361
2021-05-29 05:23:43.953000: I runner.py:281] Step = 2300 ; steps/s = 0.88, source words/s = 19631, target words/s = 23221 ; Learning rate = 0.000284 ; Loss = 2.553724
2021-05-29 05:25:37.281000: I runner.py:281] Step = 2400 ; steps/s = 0.88, source words/s = 19625, target words/s = 23213 ; Learning rate = 0.000297 ; Loss = 2.578810
2021-05-29 05:27:34.195000: I runner.py:281] Step = 2500 ; steps/s = 0.86, source words/s = 19777, target words/s = 23389 ; Learning rate = 0.000309 ; Loss = 2.547229
2021-05-29 05:29:27.641000: I runner.py:281] Step = 2600 ; steps/s = 0.88, source words/s = 19601, target words/s = 23187 ; Learning rate = 0.000321 ; Loss = 2.535319
2021-05-29 05:31:24.533000: I runner.py:281] Step = 2700 ; steps/s = 0.86, source words/s = 19791, target words/s = 23391 ; Learning rate = 0.000334 ; Loss = 2.493903
2021-05-29 05:33:17.668000: I runner.py:281] Step = 2800 ; steps/s = 0.88, source words/s = 19655, target words/s = 23253 ; Learning rate = 0.000346 ; Loss = 2.494510
2021-05-29 05:35:10.868000: I runner.py:281] Step = 2900 ; steps/s = 0.88, source words/s = 19643, target words/s = 23237 ; Learning rate = 0.000358 ; Loss = 2.533715
2021-05-29 05:37:07.679000: I runner.py:281] Step = 3000 ; steps/s = 0.86, source words/s = 19801, target words/s = 23407 ; Learning rate = 0.000371 ; Loss = 2.478043
2021-05-29 05:39:00.936000: I runner.py:281] Step = 3100 ; steps/s = 0.88, source words/s = 19633, target words/s = 23227 ; Learning rate = 0.000383 ; Loss = 2.483327
2021-05-29 05:40:57.778000: I runner.py:281] Step = 3200 ; steps/s = 0.86, source words/s = 19788, target words/s = 23403 ; Learning rate = 0.000395 ; Loss = 2.454756
2021-05-29 05:42:50.893000: I runner.py:281] Step = 3300 ; steps/s = 0.88, source words/s = 19666, target words/s = 23254 ; Learning rate = 0.000408 ; Loss = 2.443048
2021-05-29 05:44:43.879000: I runner.py:281] Step = 3400 ; steps/s = 0.89, source words/s = 19673, target words/s = 23284 ; Learning rate = 0.000420 ; Loss = 2.504201
2021-05-29 05:46:40.462000: I runner.py:281] Step = 3500 ; steps/s = 0.86, source words/s = 19842, target words/s = 23454 ; Learning rate = 0.000432 ; Loss = 2.422432
2021-05-29 05:48:33.458000: I runner.py:281] Step = 3600 ; steps/s = 0.89, source words/s = 19674, target words/s = 23280 ; Learning rate = 0.000445 ; Loss = 2.425817
2021-05-29 05:50:30.103000: I runner.py:281] Step = 3700 ; steps/s = 0.86, source words/s = 19822, target words/s = 23441 ; Learning rate = 0.000457 ; Loss = 2.395871
2021-05-29 05:52:23.175000: I runner.py:281] Step = 3800 ; steps/s = 0.88, source words/s = 19687, target words/s = 23265 ; Learning rate = 0.000470 ; Loss = 2.408193
2021-05-29 05:54:17.433000: I runner.py:281] Step = 3900 ; steps/s = 0.88, source words/s = 19789, target words/s = 23409 ; Learning rate = 0.000482 ; Loss = 2.469665
2021-05-29 05:56:12.592000: I runner.py:281] Step = 4000 ; steps/s = 0.87, source words/s = 19745, target words/s = 23361 ; Learning rate = 0.000494 ; Loss = 2.378997
2021-05-29 05:56:15.792000: I training.py:186] Saved checkpoint gdrive/MyDrive/VGR/en-fr/model/tf/OpenNMT-TF_fr_mtTransformer_tmtbpe_vs20000_lr0.0003_bs64/ckpt-4000
2021-05-29 05:56:15.792000: I training.py:202] Running evaluation for step 4000
2021-05-29 05:57:33.571000: I training.py:202] Evaluation result for step 4000: loss = 1.290468 ; perplexity = 3.634488

Are the training files already tokenized?

I believe you hit the spot! I completly forgot to tokenize my training data!

I will retry right aways.

Thanks so much!

1 Like