Which translation system is used

Jourdelune · May 25, 2022, 8:50am

Hi, I am coming to you to ask some questions about translation and different translation systems in general.
There are many translation systems (opennmt, fairseq, marian and many others) but are they all the same? I have a hard time knowing if the translation quality of a model depends on the framework used or simply on the quality and quantity of data used to train it. I hesitate between using opennmt, I know it works well and is easy to train or fairseq, more complicated to master but can be more efficient (I know facebook uses it and other tools like modernmt, I could see their translation which are very high quality). So I wonder if it is possible to reach a blue score of 60 with opennmt and argo translate (if we use the same data to reach this level with fairseq).
I’m thinking about making a collaborative training platform and so I’m looking for the right technology to use, I’m not questioning at all the legitimacy of opennmt (which works very well, it’s just that I’ve never seen it deployed on systems with very high blue scores).

Thank you for your future answers!

guillaumekln · May 25, 2022, 11:04am

Hi,

The training framework is mostly irrelevant for the final quality. The quality heavily depends on the data and training procedure (data cleaning, preprocessing, training time, etc.). In particular all frameworks will reach the same quality when trained on the same data and with the same training parameters.

However, frameworks come with default or recommended configurations that may be slightly better or worse for a fixed training time. We can confirm that Transformer models trained with OpenNMT-py’s recommended options or OpenNMT-tf’s --auto_config are competitive.

Jourdelune · May 25, 2022, 11:11am

Thanks for the information! I will turn my choice on Opennmt, I find it easier to use.