I’m a complete newbie at OpenNMT (and also at Python), trying to train a TR->SAH (Turkish-Sakha) model using OpenNMT-py on Ubuntu 20.04. I’ve trained my model in 1000 steps (100000 is too much) and still get poor translation results. Is it the lack of the tokenization process that makes my translations very poor?
I generally use training and vocabulary files. Should I use both of them or are only the training files enough?
I don’t really know what the minimum recommended training and validation steps are and how to use the validation files, and how to tokenize using the terminal.
My laptop’s graphics card is an AMD one, therefore I can only train using my laptop’s CPU.
Thank you a lot, I appreciate it.