Running OpenNMT-tf within Docker with nvidia-docker

tel34 · October 4, 2018, 11:40am

Hi, Having done some experiments with TensorFlow on CPU I’ve installed tensorflow-gpu within Docker with nvidia-docker and the tests run nicely. Could somebody please point me to documentation or a HowTo to associate OpenNMT-tf with this set-up.

Thanks
Terence

guillaumekln · October 4, 2018, 11:56am

Hi,

It depends how you want to use it really.

You could first build a custom image that includes OpenNMT-tf and invokes docker run as if you were invoking an actual script:

FROM tensorflow/tensorflow:1.11.0-gpu
WORKDIR /root
RUN pip install OpenNMT-tf
ENTRYPOINT []

docker build -t opennmt/opennmt-tf -f Dockerfile .
nvidia-docker run -it --rm --entrypoint onmt-main opennmt/opennmt-tf -h

Of course you need to mount the directories you want to access within the Docker container.

tel34 · October 4, 2018, 12:46pm

Thanks - that’s a good starting point!

tel34 · October 5, 2018, 8:35pm

Hi @guillaumekln , It just kept on building until it filled the disk. Can opennmt-tf really occupy 824.5 GB? And I don’t understand why it should be writing into an OpenNMT subdirectory to which no reference was made in the Dockerfile. Any idea what could be happening?
Regards,
Terence

guillaumekln · October 5, 2018, 9:45pm

824.5GB is the size of the context passed to docker build. See the PATH argument in the command documentation:

You should adapt the commands I shared above to your setup.

Norbert · August 29, 2019, 4:00pm

Hi,
I’m trying to install OpenNMT with Tensorflow-gpu using Docker.
My Dockerfile looks like this

FROM tensorflow/tensorflow:1.14.0-gpu
WORKDIR /root
RUN pip install OpenNMT-tf
ENTRYPOINT []

The first command docker build -t opennmt / opennmt-tf -f dockerfile . do not worry me. The second one gives me error messages and I have no idea how to get them away

WARNING: Logging before flag parsing goes to stderr.
W0829 15:45:47.053921 139708002342720 deprecation_wrapper.py:119] From /usr/local/lib/python2.7/dist-packages/opennmt/decoders/rnn_decoder.py:435: The name tf.nn.rnn_cell.RNNCell is deprecated. Please use tf.compat.v1.nn.rnn_cell.RNNCell instead.

W0829 15:45:47.517523 139708002342720 deprecation_wrapper.py:119] From /usr/local/lib/python2.7/dist-packages/opennmt/optimizers/adafactor.py:32: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

W0829 15:45:47.518002 139708002342720 deprecation_wrapper.py:119] From /usr/local/lib/python2.7/dist-packages/opennmt/optimizers/multistep_adam.py:36: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.

My first attempt with Opennmt Lua with GPU and docker worked fine. Was probably easier too

Best regards
Norbert

tel34 · August 29, 2019, 4:39pm

Hi, It’s not quite a direct answer, but have you tried nmtwizard/opennmt-tf? I find that works nicely. Here’s my command for inference via a server on a laptop without a CPU:
docker run -p 5000:5000 -v e:\tf_models\serving:/tf_models/serving nmtwizard/opennmt-tf --model %1 --model_storage /tf_models/serving serve --host 0.0.0.0 --port 5000

guillaumekln · August 29, 2019, 4:43pm

These are warnings. You can just ignore them.

Norbert · August 30, 2019, 5:31am

I can ignore the warnings, but I am sure the container is not available or not build

guillaumekln · August 30, 2019, 6:00am

The command above uses --rm which deletes the container when it exits.

Norbert · September 3, 2019, 8:58am

Hi, thanks for the tip. I tried it first without mounting directories, but it does not work.
The easy way: sudo nvidia-docker run nmtwizard/opennmt-tf
I get some warnings and this error message: entrypoint.py: error: too few arguments.

tel34 · September 3, 2019, 9:13am

Hi, Sorry: the script I gave you was for running on my Windows laptop without a GPU. The Linux GPU command is:
CUDA_VISIBLE_DEVICES=0 nvidia-docker run -p 5000:5000 -v$PWD:/home/miguel nmtwizard/opennmt-tf --model ned2eng_0104 --model_storage /home/miguel/tf_experiments/ned2eng_tf/serving serve --host 0.0.0.0 --port 5000
Both are running as I write this. You need to make sure everything is adapted to your local system.