Hi,
I’m facing a really weird bug. I’m training EnFr model and for any input sentence the model keeps giving as an inference one and the same sentence.
For the training I’m using 8M shuffled and cleaned subset from Gigafr, eubook, multiun and newscom downloaded from opus.nlp.eu in moses format.
The model is trained for 25 000 (~10 epochs) steps.
French corpus is not transliterated. Is it required for this language pair?
The dataset is shuffled, cleaned and preprocessed with joined-bpe SentencePiece.
SentencePiece model training parameters are:
spm_train
–input=train #( 5M (2.5M En + 2.5M Fr) shuffled subset from the main corpus)
–model_prefix=spm
–vocab_size=32000
–input_format=text
–num_threads=5
–input_sentence_size=4999995
–max_sentence_length=500
–shuffle_input_sentence=true
–character_coverage=0.9995
–model_type=bpe
Any ideas what can cause such behavior ?
Thank you!