You need to activate the corresponding transform "bart" on the datasets you want to use it on. This is not very clear in the docs, we should probably update it.
Add transforms: [bart] either at the root of your config if you want it to be applied on all datasets, or on some specific datasets in your data entries.
I’m running it in a docker image and installed opennmt via pip. Here’s what that shows:
root@a33d70122f86:/mh# pip list | grep nmt
pyonmttok 1.25.0
I’m running onmt_train from a shell script. It does a bunch of data wrangling, builds the vocab, and then seems to hit the error. That error seems to generate a bunch of additional problems, but I’ve cut them off below.
Mike H.
root@a33d70122f86:/mh# ./both.sh
ady
gre
ice
ita
khm
lav
mlt_latn
rum
slv
wel_sw
Corpus corpus_1's weight should be given. We default it to 1 for you.
[2021-04-15 13:42:42,294 INFO] Counter vocab from 7000 samples.
[2021-04-15 13:42:42,294 INFO] Build vocab on 7000 transformed examples/corpus.
[2021-04-15 13:42:42,301 INFO] corpus_1's transforms: TransformPipe(BARTNoiseTransform(None))
[2021-04-15 13:42:42,301 INFO] Loading ParallelCorpus(/workspace/big/BIG_src-train.txt, /workspace/big/BIG_tgt-train.txt, align=None)...
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/opt/conda/lib/python3.8/site-packages/onmt/inputters/corpus.py", line 298, in build_sub_vocab
maybe_example = DatasetAdapter._process(item, is_train=True)
File "/opt/conda/lib/python3.8/site-packages/onmt/inputters/corpus.py", line 69, in _process
maybe_example = transform.apply(
File "/opt/conda/lib/python3.8/site-packages/onmt/transforms/transform.py", line 189, in apply
example = transform.apply(
File "/opt/conda/lib/python3.8/site-packages/onmt/transforms/bart.py", line 380, in apply
if is_train and self.vocabs is not None:
AttributeError: 'BARTNoiseTransform' object has no attribute 'vocabs'
"""
The above exception was the direct cause of the following exception:
...
Hi @hammondm,
Just checked in the code. The error is not from training, but when building the vocab. I’ll commit a PR to fix this issue.
As a temporary workaround, you can remove the bart transform when build_vocab and only add them in the train config, the error would disappear and it won’t affect the result.