Doubt with Number of Epoch

I am experimenting with English-French Machine Translation using Transformers. The dataset consists of 16M parallel sentences. But after applying BPE my vocab size reduces to 92000 (approx) for both English and French. On using 2 Parallel GPUs, and Batch size = 4096.

I have 2 doubts

  1. One epoch with the above data will be 16M/8192 steps. (Please correct me if I am wrong)
  2. How many epochs will be sufficient to achieve a respectable BLEU Score?

Many thanks in advance.

Hi there,

  1. Careful, your batch size 4096 is probably in tokens, not sentences, you have to take that into account in your estimation. You also can have a look at the logs while training, loading dataset ... to see at which point you are in the dataset at a given step/time.
  2. Depends on your setup, but Transformer can converge quite quickly so you may have some good results in a few tens of thousands of steps (though it will probably keep learning for longer.)