When I’m using the OpenNMT project with _dynamic_dict
function which is like
def _dynamic_dict(self, examples_iter):
for example in examples_iter:
src = example["src"]
src_vocab = torchtext.vocab.Vocab(Counter(src))
self.src_vocabs.append(src_vocab)
# Mapping source tokens to indices in the dynamic dict.
src_map = torch.LongTensor([src_vocab.stoi[w] for w in src])
example["src_map"] = src_map
if "tgt" in example:
tgt = example["tgt"]
mask = torch.LongTensor(
[0] + [src_vocab.stoi[w] for w in tgt] + [0])
example["alignment"] = mask
yield example
it kept reporting errors with Traceback like:
File "C:\Users\me\project-main\onmt\io\TextDataset.py", line 369, in _dynamic_dict
src_map = torch.LongTensor([src_vocab.stoi[w] for w in src])
File "C:\Users\me\project-main\onmt\io\TextDataset.py", line 369, in <listcomp>
src_map = torch.LongTensor([src_vocab.stoi[w] for w in src])
File "C:\Python37\64bit\bts\lib\site-packages\torch\nn\modules\module.py", line 1178, in __getattr__
type(self).__name__, name))
AttributeError: 'Vocab' object has no attribute 'stoi'
Then I realized it’s a torchtext version problem so instead of using Vocab.stoi[]
, I switched to Vocab.getstoi()
method so I can get the stoi
dictionary instance. To be specific, it’s from
src_map = torch.LongTensor([src_vocab.stoi[w] for w in src])
to
stoi = src_vocab.get_stoi()
src_map = torch.LongTensor([stoi[w] for w in src])
to get the dictionary stoi
and do the following mapping procedure. Then I get another error with Traceback:
File "C:\Users\me\project-main\onmt\io\TextDataset.py", line 370, in _dynamic_dict
stoi = src_vocab.get_stoi()
File "C:\Python37\64bit\bts\lib\site-packages\torchtext\vocab\vocab.py", line 149, in get_stoi
return self.vocab.get_stoi()
AttributeError: 'Counter' object has no attribute 'get_stoi'
but I’ve printed that the type of src_vocab
is <class 'torchtext.vocab.vocab.Vocab'>
and looked into the docs that my version of torchtext 0.11.2 sure have the method .getstoi()
to get the mapping dictionary. May anybody help? Either the original problem or the latter one will be cool. I see some people have the same original problem but few constructive answers so I assume some of you may be in that place once.
-------------------------------------UPDATE-------------------------------------
Simply, using Vocab.__getitem__()
instead of Vocab.stoi[]
works for this specific problem but another problem occurred for the same reason. So the src_vocab
is some instance that could call Vocab.__getitem__()
but not Vocab.get_stoi()
.
torchtext.vocab.vocab — torchtext 0.12.0a0+1fb2aed documentation Only those functions with underscores work. Other functions still recalls the AttributeError: 'Counter' object has no attribute 'lookup_token'
I tried to write those functions without underscores by myself with those functions with underscores but found out that they still cannot make a two-directional mapping with certain clarified limited tokens set. So what should I do then?
P.S.:I really want to roll back to a version where .stoi[]
works, but there seems no torch==?~torchtext==?~cuda==11.2
version coincidence that has the variable Vocab.stoi
. Is there?