What does "number of examples" means?

kdminamoto · February 4, 2020, 9:32am

I am training my model based on a 1 million sentence pairs corpus using opennmt-py in character-level,
but the log showed that
number of examples: 22310

What does “number of examples” means?
Was there anything wrong while I was doing my preprocessing?

Thank you

francoishernandez · February 4, 2020, 9:53am

This is the number of examples in the data shard that was just loaded.
Preprocessing ‘cuts’ dataset(s) in several pieces to facilitate loading and use less memory when training. Either you have several shards, and this one is a small one, or your preprocessing was too harsh. By default -src_seq_length and _tgt_seq_length are set to 50, which is quite low and might be too little for your data.