Issue loading pretrained models for machine translation

yiulau · August 14, 2019, 10:00pm

I get this error when trying to load the pretrained model available at http://opennmt.net/Models-py/

Here is the German-English - 2-layer BiLSTM model:

Traceback (most recent call last):
  File "load_model.py", line 2, in <module>
    model = torch.load("iwslt-brnn2.s131_acc_62.71_ppl_7.74_e20.pt")
  File "/home/yiu/.conda/envs/tf1torchpy36/lib/python3.6/site-packages/torch/serialization.py", line 386, in load
    return _load(f, map_location, pickle_module, **pickle_load_args)
  File "/home/yiu/.conda/envs/tf1torchpy36/lib/python3.6/site-packages/torch/serialization.py", line 573, in _load
    result = unpickler.load()
  File "/home/yiu/.conda/envs/tf1torchpy36/lib/python3.6/site-packages/torchtext/vocab.py", line 119, in __setstate__
    if state['unk_index'] is None:
KeyError: 'unk_index'

Edit: Actually the same error occurs for the English-German - Transformer model as well

Traceback (most recent call last):
  File "load_model.py", line 3, in <module>
    model = torch.load("averaged-10-epoch.pt")
  File "/home/yiu/.conda/envs/tf1torchpy36/lib/python3.6/site-packages/torch/serialization.py", line 386, in load
    return _load(f, map_location, pickle_module, **pickle_load_args)
  File "/home/yiu/.conda/envs/tf1torchpy36/lib/python3.6/site-packages/torch/serialization.py", line 573, in _load
    result = unpickler.load()
  File "/home/yiu/.conda/envs/tf1torchpy36/lib/python3.6/site-packages/torchtext/vocab.py", line 119, in __setstate__
    if state['unk_index'] is None:
KeyError: 'unk_index'

Pytorch version = 1.2.0
Torchtext version = 0.4.0

Bachstelze · August 15, 2019, 6:42pm

openNMT-py isn’t yet compatible with pytorch -v 1.2.0. Reference git issue: https://github.com/OpenNMT/OpenNMT-py/issues/1524
But i don’t know if this causes your error. Which command did you execute?

yiulau · August 15, 2019, 6:57pm

I m taking the pretrained models from the website and loading them using torch.load

model = torch.load("averaged-10-epoch.pt")

yiulau · August 15, 2019, 7:16pm

I just tested I am getting the same error even when running with PyTorch 1.1.

Bachstelze · August 15, 2019, 9:51pm

It could be also a problem with torchtext. Since version 0.4.0 the unk token isn’t hard coded anymore (so it wasn’t necessary in the dictionary).
What do you want to do exactly with your load_model.py? Perhaps you could use the terminal commands?

francoishernandez · August 19, 2019, 4:27pm

Indeed, there seems to be some breaking changes with the latest release of torchtext.
Not sure what they’ve been doing. On an ‘old’ 0.4.0 it was loading fine, but I just updated to the latest release and get the same error.

francoishernandez · August 19, 2019, 4:49pm

Just asked on the related PR on pytorch/text. They should fix it.
(https://github.com/pytorch/text/pull/531)
In the mean time, you can use torchtext prior to this PR.
pip install --upgrade git+https://github.com/pytorch/text@a63e45e569aa61ca238cca41c49e01dda34466b0

francoishernandez · August 21, 2019, 8:23am

@yiulau @Bachstelze this should be fixed by PR #591 on torchtext.

guillaumekln · February 17, 2020, 3:09pm

Ah, we have this issue in CTranslate2 as well:

@francoishernandez Maybe I missed something but it seems the model can still be loaded by OpenNMT-py without error (with the “new” torchtext 0.4.0). Do you know why?

Yeah, I found the same thing. Looks like they released 2 different 0.4.0 at some point… For reference, torchtext 0.5.0 includes the fix.

francoishernandez · February 17, 2020, 3:25pm

Looks like they released a third 0.4.0 with the fix…