- The environment is first obtained through ‘docker pull Package opennmt-py · GitHub’.
- src/tgt_voca : onmt_build_vocab -config train.yaml -n_sample 72000
- src/tgt_subword_model : google/sentencepiece: Unsupervised text tokenizer for Neural Network-based text generation. (github.com)

Question 1：

When I start my model training using the yaml file given below, the following message appears with high frequency:

```
unigram_model.cc(494) LOG(WARNING) Too big agenda size 10003. Shrinking (round 1) down to 100.
unigram_model.cc(494) LOG(WARNING) Too big agenda size 10002. Shrinking (round 2) down to 100.
unigram_model.cc(494) LOG(WARNING) Too big agenda size 10003. Shrinking (round 3) down to 100.
unigram_model.cc(494) LOG(WARNING) Too big agenda size 10003. Shrinking (round 4) down to 100.
unigram_model.cc(494) LOG(WARNING) Too big agenda size 10003. Shrinking (round 1) down to 100.
unigram_model.cc(494) LOG(WARNING) Too big agenda size 10003. Shrinking (round 1) down to 100.
.....
[2024-01-19 12:37:57,483 WARNING] The batch will be filled until we reach 8, its size may exceed 4096 tokens
[2024-01-19 12:37:57,483 WARNING] The batch will be filled until we reach 8, its size may exceed 4096 tokens
[2024-01-19 12:37:57,483 WARNING] The batch will be filled until we reach 8, its size may exceed 4096 tokens
[2024-01-19 12:37:58,932 WARNING] The batch will be filled until we reach 8, its size may exceed 4096 tokens
[2024-01-19 12:38:03,003 WARNING] The batch will be filled until we reach 8, its size may exceed 4096 tokens
[2024-01-19 12:38:03,004 WARNING] The batch will be filled until we reach 8, its size may exceed 4096 tokens
....
Grad overflow on iteration 7459
Using dynamic loss scale of 1
Grad overflow on iteration 7460
Using dynamic loss scale of 1
Grad overflow on iteration 7461
Using dynamic loss scale of 1
Grad overflow on iteration 7462
Using dynamic loss scale of 1
Grad overflow on iteration 7463
Using dynamic loss scale of 1
Grad overflow on iteration 7464
Using dynamic loss scale of 1
Grad overflow on iteration 7465
Using dynamic loss scale of 1
Grad overflow on iteration 7466
Using dynamic loss scale of 1
....
```

Question 2：When I enable the parameter: ‘’ src/tgt_subword_vocab (The vocab file generated at the same time as the spm (src_subword_model) training.)‘’, the model training will not run, so I’m going to skip using spm as a disambiguation for my own data and choose to train a translation model based on a pre-trained language model (e.g., bert), but I can’t figure out how to do this by looking at the docs. I can’t figure out how to do this, so I’m looking for help! Thanks a million!

```
#.yaml
# TensorBoard parameters
tensorboard: true
tensorboard_log_dir: Train/log/tensorboard_logs
# Meta opts:
save_data: Train/vocab/vocab
## Where the vocab(s) will be written
# src_vocab: Train/vocab/tibetan.vocab
# tgt_vocab: Train/vocab/chinese.vocab
src_vocab: Train/vocab/vocab.src
tgt_vocab: Train/vocab/vocab.tgt
share_vocab: True
# Prevent overwriting existing files in the folder
overwrite: False
src_vocab_size: 71928
tgt_vocab_size: 71928
src_words_min_frequency: 10
tgt_words_min_frequency: 10
### Transform related opts:
#### Subword
src_subword_model: Train/vocab/tibetan.model
#src_subword_vocab: Train/vocab/tibetan.vocab
tgt_subword_model: Train/vocab/chinese.model
#tgt_subword_model: Train/vocab/chinese.vocab
src_subword_nbest: 3
tgt_subword_nbest: 3
src_subword_alpha: 0.01
tgt_subword_alpha: 0.01
src_subword_type: sentencepiece
tgt_subword_type: sentencepiece
src_onmttok_kwargs: "{'mode': 'aggressive', 'spacer_annotate': True}"
tgt_onmttok_kwargs: "{'mode': 'aggressive', 'spacer_annotate': True}"
#### Sampling
switchout_temperature: 1.0
tokendrop_temperature: 1.0
tokenmask_temperature: 1.0
#### Filter
src_seq_length: 512
tgt_seq_length: 512
#### BART
permute_sent_ratio: 0.0
rotate_ratio: 0.0
insert_ratio: 0.0
random_ratio: 0.0
mask_ratio: 0.0
mask_length: subword
poisson_lambda: 3.0
replace_length: 1
params:
average_loss_in_time: true
# Corpus opts:
data:
corpus_1:
path_src: TRAIN_DATASET/bo_train.txt
path_tgt: TRAIN_DATASET/zh_train.txt
transforms: [sentencepiece]
valid:
path_src: TRAIN_DATASET/bo_val.txt
path_tgt: TRAIN_DATASET/zh_val.txt
transforms: [sentencepiece]
# Model configuration
save_model: Train/run/model.bo-zh
keep_checkpoint: 5
save_checkpoint_steps: 10000
average_decay: 0.0001
seed: 2345
report_every: 100
train_steps: 1000000
valid_steps: 10000
bucket_size: 262144
world_size: 2
gpu_ranks: [0, 1]
num_workers: 2
batch_type: "tokens"
batch_size: 2048
valid_batch_size: 10
batch_size_multiple: 8
accum_count: [3]
accum_steps: [0]
model_dtype: "fp16"
optim: "fusedadam"
learning_rate: 20
warmup_steps: 6000
decay_method: "noam"
adam_beta2: 0.998
max_grad_norm: 5.0
loss_scale: 0
label_smoothing: 0.1
param_init: 0
param_init_glorot: true
normalization: "tokens"
encoder_type: transformer
decoder_type: transformer
enc_layers: 6
dec_layers: 6
heads: 8
hidden_size: 512
word_vec_size: 512
transformer_ff: 2048
dropout_steps: [0]
dropout: [0.2]
attention_dropout: [0.2]
share_decoder_embeddings: true
share_embeddings: true
position_encoding: true
```