Tensorboard not working on OpenNMT-py

seamusl · February 6, 2021, 2:42pm

I have openNMT-py version 2 working in a colab environment. In order to get it working I have to install the following package:

!pip install torch==1.7.1+cu101 torchvision==0.8.2+cu101 -f https://download.pytorch.org/whl/torch_stable.html

If I don’t use this instruction, the training won’t start since it needs torch1.7.1 to be compatible with cuda101 which the graphics card i.e. OpenNMT-py 2.0 won’t work on the NVIDIA graphics card without cuda101 and torch1.7.1.

Using the above, training works fine and I can build good models.

However, when I try to use tensorboard, I get problems. I have the following in my config.yaml file and I have setup tensorboard correctly within colab (i.e. tensorboard loads but it has nothing to display):

Details from my config.yaml:

Logging

tensorboard: true
tensorboard_log_dir: runs/test

When I run the following command:

onmt_train -config config.yaml

tensorboard doesn’t work, and it doesn’t log the tensorboard log files. It just crashes the program with the error which is outlined below.

Any feedback would be appreciated since it’s important for me to have tensorboard working. Thanks.

Séamus.

ERROR ON RUNNING:
File “/usr/local/lib/python3.6/dist-packages/pkg_resources/init.py”, line 772, in resolve
raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The ‘torch==1.6.0’ distribution was not found and is required by OpenNMT-py

seamusl · February 6, 2021, 5:26pm

The following fixes it easily enough:

pip install torch==1.6.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html

i.e. used the above to replace the follwing:
!pip install torch==1.7.1+cu101 torchvision==0.8.2+cu101 -f https://download.pytorch.org/whl/torch_stable.html

I just needed a version of torch 1.6.0 which was compatible with Cuda 10.1