It’s still not working. Could you share you parameters in .yam and the complete parameters when training is launched, please? The following are mines:
My configuration:
# The directory where models and summaries will be saved. It is created if it does not exist.
model_dir: Model/Transformer-Big/2002756/
# (required for train_and_eval and train run types).
train_features_file: Data/Training/
train_labels_file: Data/Training/
# (required for train_end_eval and eval run types).
eval_features_file: Data/Evaluation/newstest-2013.fr_raw.txt
eval_labels_file: Data/Evaluation/newstest-2013.en_raw.txt
# (optional) Models may require additional resource files (e.g. vocabularies).
source_words_vocabulary: fr-vocab-30000-tokenized.txt
target_words_vocabulary: en-vocab-30000-tokenized.txt
source_tokenizer_config: tokenization.yml
target_tokenizer_config: tokenization.yml
# Training options.
batch_size: 1024
# (optional) Batch size is the number of "examples" or "tokens" (default: "examples").
batch_type: examples
# (optional) Save a checkpoint every this many steps.
save_checkpoints_steps: 1000
# (optional) How many checkpoints to keep on disk.
keep_checkpoint_max: 10
# (optional) Save summaries every this many steps.
save_summary_steps: 1000
# (optional) Train for this many steps. If not set, train forever.
train_steps: 1280000
# (optional) The number of threads to use for processing data in parallel (default: 4).
num_threads: 8
# (optional) The number of elements from which to sample during shuffling (default: 500000).
# Set 0 or null to disable shuffling, -1 to match the number of training examples.
sample_buffer_size: 500000
# (optional) Number of checkpoints to average at the end of the training to the directory
# model_dir/avg (default: 0).
average_last_checkpoints: 10
# (optional) Evaluation options.
# (optional) The batch size to use (default: 32).
batch_size: 1024
# (optional) The number of threads to use for processing data in parallel (default: 1).
num_threads: 8
# (optional) Evaluate every this many seconds (default: 18000).
eval_delay: 0
# (optional) Save evaluation predictions in model_dir/eval/.
save_eval_predictions: True
# (optional) Evalutator or list of evaluators that are called on the saved evaluation predictions.
# Available evaluators: BLEU, BLEU-detok, ROUGE
external_evaluators: [BLEU, BLEU-detok]
# (optional) Model exporter(s) to use during the training and evaluation loop:
# last, final, best, or null (default: last).
exporters: last
The information shown when training is launched:
INFO:tensorflow:Using parameters: {
"data": {
"eval_features_file": "Data/Evaluation/newstest-2013.fr_raw.txt",
"eval_labels_file": "Data/Evaluation/newstest-2013.en_raw.txt",
"source_tokenizer_config": "tokenization.yml",
"source_words_vocabulary": "fr-vocab-30000-tokenized.txt",
"target_tokenizer_config": "tokenization.yml",
"target_words_vocabulary": "en-vocab-30000-tokenized.txt",
"train_features_file": "Data/Training/",
"train_labels_file": "Data/Training/"
"eval": {
"batch_size": 1024,
"eval_delay": 0,
"exporters": "last",
"external_evaluators": [
"num_threads": 8,
"save_eval_predictions": true
"infer": {
"batch_size": 32,
"bucket_width": 5
"model_dir": "Model/Transformer-Big/2002756/",
"params": {
"average_loss_in_time": true,
"beam_width": 4,
"decay_params": {
"model_dim": 1024,
"warmup_steps": 8000
"decay_type": "noam_decay_v2",
"gradients_accum": 1,
"label_smoothing": 0.1,
"learning_rate": 2.0,
"length_penalty": 0.6,
"optimizer": "LazyAdamOptimizer",
"optimizer_params": {
"beta1": 0.9,
"beta2": 0.998
"score": {
"batch_size": 64
"train": {
"average_last_checkpoints": 10,
"batch_size": 1024,
"batch_type": "examples",
"bucket_width": 1,
"keep_checkpoint_max": 10,
"maximum_features_length": 100,
"maximum_labels_length": 100,
"num_threads": 8,
"sample_buffer_size": 500000,
"save_checkpoints_steps": 1000,
"save_summary_steps": 1000,
"train_steps": 1280000