A version-related torchtext question on _dynamic_dict function

fchen92 · March 21, 2022, 6:34am

When I’m using the OpenNMT project with _dynamic_dict function which is like

    def _dynamic_dict(self, examples_iter):
        for example in examples_iter:
            src = example["src"]
            src_vocab = torchtext.vocab.Vocab(Counter(src))
            self.src_vocabs.append(src_vocab)
            # Mapping source tokens to indices in the dynamic dict.
            src_map = torch.LongTensor([src_vocab.stoi[w] for w in src])
            example["src_map"] = src_map

            if "tgt" in example:
                tgt = example["tgt"]
                mask = torch.LongTensor(
                        [0] + [src_vocab.stoi[w] for w in tgt] + [0])
                example["alignment"] = mask
            yield example

it kept reporting errors with Traceback like:

  File "C:\Users\me\project-main\onmt\io\TextDataset.py", line 369, in _dynamic_dict
    src_map = torch.LongTensor([src_vocab.stoi[w] for w in src])
  File "C:\Users\me\project-main\onmt\io\TextDataset.py", line 369, in <listcomp>
    src_map = torch.LongTensor([src_vocab.stoi[w] for w in src])
  File "C:\Python37\64bit\bts\lib\site-packages\torch\nn\modules\module.py", line 1178, in __getattr__
    type(self).__name__, name))
AttributeError: 'Vocab' object has no attribute 'stoi'

Then I realized it’s a torchtext version problem so instead of using Vocab.stoi[], I switched to Vocab.getstoi() method so I can get the stoi dictionary instance. To be specific, it’s from

    src_map = torch.LongTensor([src_vocab.stoi[w] for w in src])

to

    stoi = src_vocab.get_stoi()
    src_map = torch.LongTensor([stoi[w] for w in src])

to get the dictionary stoi and do the following mapping procedure. Then I get another error with Traceback:

  File "C:\Users\me\project-main\onmt\io\TextDataset.py", line 370, in _dynamic_dict
    stoi = src_vocab.get_stoi()
  File "C:\Python37\64bit\bts\lib\site-packages\torchtext\vocab\vocab.py", line 149, in get_stoi
    return self.vocab.get_stoi()
AttributeError: 'Counter' object has no attribute 'get_stoi'

but I’ve printed that the type of src_vocab is <class 'torchtext.vocab.vocab.Vocab'> and looked into the docs that my version of torchtext 0.11.2 sure have the method .getstoi() to get the mapping dictionary. May anybody help? Either the original problem or the latter one will be cool. I see some people have the same original problem but few constructive answers so I assume some of you may be in that place once.

-------------------------------------UPDATE-------------------------------------

Simply, using Vocab.__getitem__() instead of Vocab.stoi[] works for this specific problem but another problem occurred for the same reason. So the src_vocab is some instance that could call Vocab.__getitem__() but not Vocab.get_stoi().

torchtext.vocab.vocab — torchtext 0.12.0a0+1fb2aed documentation Only those functions with underscores work. Other functions still recalls the AttributeError: 'Counter' object has no attribute 'lookup_token' I tried to write those functions without underscores by myself with those functions with underscores but found out that they still cannot make a two-directional mapping with certain clarified limited tokens set. So what should I do then?

P.S.:I really want to roll back to a version where .stoi[] works, but there seems no torch==?~torchtext==?~cuda==11.2 version coincidence that has the variable Vocab.stoi. Is there?

guillaumekln · March 22, 2022, 9:00am

OpenNMT-py has the requirement torchtext==0.5.0. Did you try with this version? As far as I know it works with recent PyTorch releases.