Hello,
I’m trying to use the weight features to train multi languages model, but i’m getting this error:
ValueError: shuffle_buffer_size < 0 is not compatible with weighted datasets
here my training parameters:
train:
average_last_checkpoints: 4
batch_size: 8115
batch_type: tokens
effective_batch_size: 25000
keep_checkpoint_max: 2
length_bucket_width: 1
max_step: 500000
maximum_features_length: 66
maximum_labels_length: 64
sample_buffer_size: -1
save_checkpoints_steps: 2000
save_summary_steps: 100
shuffle_buffer_size: 500000
and here is an example of my yaml file for the weights portion:
data:
eval_features_file: ./PREP/Multi/test/OpenNMT/src-val-tokenized.txt
eval_labels_file: ./PREP/Multi/test/OpenNMT/tgt-val-tokenized.txt
source_sequence_controls:
end: true
source_vocabulary: ./PREP/Multi/test/OpenNMT/vocab/SourceSP.vocab.txt
target_vocabulary: ./PREP/Multi/test/OpenNMT/vocab/TargetSP.vocab.txt
train_features_file:
- ./PREP/Multi/test/eng-nep/OpenNMT/TokenizedFiles/src-train-tokenized.txt
- ./PREP/Multi/test/nep-eng/OpenNMT/TokenizedFiles/src-train-tokenized.txt
- ./PREP/Multi/test/eng-fon/OpenNMT/TokenizedFiles/src-train-tokenized.txt
- ./PREP/Multi/test/fon-eng/OpenNMT/TokenizedFiles/src-train-tokenized.txt
train_files_weights:
- 0.2329
- 0.2329
- 0.2671
- 0.2671
train_labels_file:
- ./PREP/Multi/test/eng-nep/OpenNMT/TokenizedFiles/tgt-train-tokenized.txt
- ./PREP/Multi/test/nep-eng/OpenNMT/TokenizedFiles/tgt-train-tokenized.txt
- ./PREP/Multi/test/eng-fon/OpenNMT/TokenizedFiles/tgt-train-tokenized.txt
- ./PREP/Multi/test/fon-eng/OpenNMT/TokenizedFiles/tgt-train-tokenized.txt
I have no clue of what I’m doing wrong.
Thanks for the help,
Samuel