I am training my model based on a 1 million sentence pairs corpus using opennmt-py in character-level,
but the log showed that
number of examples: 22310
What does “number of examples” means?
Was there anything wrong while I was doing my preprocessing?
Thank you
This is the number of examples in the data shard that was just loaded.
Preprocessing ‘cuts’ dataset(s) in several pieces to facilitate loading and use less memory when training. Either you have several shards, and this one is a small one, or your preprocessing was too harsh. By default -src_seq_length
and _tgt_seq_length
are set to 50, which is quite low and might be too little for your data.