I am training my model based on a 1 million sentence pairs corpus using opennmt-py in character-level,
but the log showed that
number of examples: 22310
What does “number of examples” means?
Was there anything wrong while I was doing my preprocessing?
This is the number of examples in the data shard that was just loaded.
Preprocessing ‘cuts’ dataset(s) in several pieces to facilitate loading and use less memory when training. Either you have several shards, and this one is a small one, or your preprocessing was too harsh. By default
_tgt_seq_length are set to 50, which is quite low and might be too little for your data.