Raising training speed without comprimising too much on quality

Loek · December 3, 2017, 12:31pm

Even with two GTX780 GPU´s, the time this takes is insane. I need a higher vocab due to the high number of unknowns, but I don´t want to compromise too much on quality. What is your best suggestion?

Loek · December 3, 2017, 12:36pm

Also: only one of the CPU eight cores is being used, and I have plenty of memory left.

Loek · December 3, 2017, 1:33pm

After 1,5 hours and 0 steps (not epochs!), I gave up. This is strange behavior on a pure Ubuntu system. In Windows VirtualBox I get faster results using the same parameters (without passthrough, which doesn work).

Loek · December 3, 2017, 1:50pm

There´s indeed something very strange going on with the GPU´s: training without them is definitely faster. I lowered the layers to 4 and the size to 800; now the first step is written in 13 minutes or do.

All CPU cores are used.

guillaumekln · December 4, 2017, 8:17am

Did you try with 1 GPU only?

For handling unknows, consider using BPE (see the documentation and search in the forum). It’s not a perfect solution either but it is better than giant word vocabularies.

Loek · December 4, 2017, 9:01am

Thank you! I’ll try that as soon as I have finished the current model.

tel34 · December 4, 2017, 12:36pm

Hi Loek,
I have kept the vocab in the model down by using the -phrase_table option which allows me to incorporate a dictionary of some 350,000 tokens. Although it only allows single tokens you can get around this by using an underscore (e.g. systeembeheerder|||system_manager) which you can take out in post-processing.

Loek · December 4, 2017, 1:14pm

Thank you for the tip, Terence!