OpenNMT on Python in Windows: Not working

Loek · November 14, 2017, 9:24am

One step closer! But it seems I need to choose between running at the Python prompt or the Windows prompt and can’t have my cake and eat it. Please see the attached screenshot. As you can see, Python is there.

Loek · November 14, 2017, 9:24am

P.S. Tensorflow is there when I use import tensorflow as tf in the Python prompt.

Loek · November 14, 2017, 9:30am

So my guess is that I need to use import opennmt as nmt or something like that in the Python prompt. From there, how do I get access to commands like train?

guillaumekln · November 14, 2017, 9:31am

So python is 2.7 but py is 3.6. This is confusing.

As you installed TensorFlow for Python 3, you should:

Install pyyaml for Python 3 as well
Invoke OpenNMT-tf scripts with py

Loek · November 14, 2017, 9:39am

So Python finds main.py, but main.py has issues finding other stuff.

guillaumekln · November 14, 2017, 9:41am

Use:

py -m bin.main -h

This syntax is important as it adds the current directory to the Python paths.

Loek · November 14, 2017, 9:43am

Thank you. But then it has issues finding yaml, which is installed.

(I feel for the poor translator - who mostly has not background in programming or whatsoever) who ever needs to get this running!

guillaumekln · November 14, 2017, 9:47am

You have conflicting Python versions.

You installed TensorFlow for Python 3 and PyYAML for Python 2. You should select one version and install all required packages for that version.

Loek · November 14, 2017, 10:32am

Ah, fantastic. Python 3.6 for Windows installs 2.7 with it by default, which is indeed very confusing. I just killed the entire 2.7 directory, then installed YAML via Git for Windows and voila, main is responding! I’ll keep on playing with this. Thank you very much for your help so far!

Loek · November 14, 2017, 12:57pm

It seems the sample files have codes in them that are not allowed?

Loek · November 14, 2017, 1:46pm

Yes, no matter what file I feed, I always get UnicodeDecodeError: ‘cp932’ codec can’t decode byte 0xef in position xxxx: illegal multibyte sequence during the train phase.

Loek · November 14, 2017, 3:03pm

I can add sitecustomize.py to Python36/Lib stating:

import sys
import codecs
sys.stdout = codecs.getwriter(‘utf8’)(sys.stdout)
sys.stderr = codecs.getwriter(‘utf8’)(sys.stderr)

Now the program does run, but it quits without result, error or whatsoever. It starts, and stops. All I see is my prompt again.

guillaumekln · November 14, 2017, 4:10pm

What command did you run?

Loek2 · November 14, 2017, 4:32pm

Answering from another account as my first has reached its posting limit for the first day after registration.

python -m bin.main train --model config/models/nmt_small.py --config config/opennmt-defaults.yml config/data/xxx.yml

Whereby xxx is the name of the yaml file in question (it doesn’t work for the test files or my own files): both show the same symptom.

Loek2 · November 15, 2017, 11:06am

Should I conclude that OpenNMT does not run on Windows?

guillaumekln · November 15, 2017, 1:31pm

To be clear, Windows support is not a priority as people are massively using Linux systems for this kind of application. But if TensorFlow is correctly working on Windows, it would be nice that OpenNMT-tf works there as well.

Could you open an issue on GitHub? I will try to debug that when I have my hands on a Windows system. Thanks.

Loek · November 15, 2017, 1:55pm

Done!

I would use Linux for this, but buying another 6K laptop with 2 GPU’s just for experimenting is a tad too much to ask, I’m afraid. This is my main system, and installing a dual boot on that is a bit scary. Unfortunately Windows is the only viable system to use on work computers in an office environment. It’s also the main system used by translators, because their clients use it, and because their translation software like memoQ and Trados runs on it.

panosk · November 15, 2017, 3:47pm

Hi @Loek,

I can assure you that the best way to use MT frameworks is to have a separate Linux system and use a service to connect these systems with CAT tools in Windows OSes. You don’t have to build a very expensive Linux server for a single user or small team setup – actually the only expensive component for OpenNMT should be a ~600 euros Geforce 1080. Besides, it’s impractical to use OpenNMT on the same machine you translate as it takes a lot of resources and time, at least for training.
I don’t know your use case, but OpenNMT has already paid for my Geforce 1080 I bought a few months ago

Loek · November 15, 2017, 4:57pm

What would be best is a live service that continuously updates while the newest translations are fed to it. Is that what https://www.modernmt.eu/ does?

panosk · November 15, 2017, 5:19pm

I agree, that would be ideal, but AFAIK there is still the limitation that, in order to fully benefit from new/updated data, a new engine must be trained (both for SMT and NMT). I’m not familiar with modernmt, I became aware of it a few days ago, but it looks very interesting. It seems it uses Moses and the python version of OpenNMT, with an extensive REST API. I’m really curious though if the update function of the API actually applies to neural engines too.