You need to set both
-world_size which is a count, and
-gpu_ranks which is an index.
Here is an example command for training on one GPU:
CUDA_VISIBLE_DEVICES=0 python3 train.py -data data/demo -save_model demo-model -world_size 1 -gpu_ranks 0
As for early Early Stopping, returning to its code and it is pull request, one can learn much.
Basically, the arguments, can be something like this:
-early_stopping 4 -early_stopping_criteria accuracy ppl
If you are training for 100,000 steps (
-train_steps default) and validating every 10,000 steps (
-valid_steps default), then you have 10 validation steps. So you can select the
-early_stopping to happen if there is no improve for 4 validations. The default is 0 which means “do not use it”.
Looking into the code, by default,
-early_stopping_criteria is set to use both “Accuracy” and “PPL” (
DEFAULT_SCORERS). So unless you want to change this, you do not need to set the argument, and if you set them as the example above, it is still okay. On the other hand, to use
-early_stopping_criteria, you must set
-early_stopping to a number greater than 0.
Hope this helps.