Masking/noise during training

Hi

I’ve been trying to get masking to work, but the relevant parameters seem to have no effect. For example:

mask_ratio: 0.1
mask_length: word
rotate_ratio: 0.1
insert_ratio: 0.1
permute_sent_ratio: 0.1

Whatever value I set, there’s no difference.

Is there some other parameter that turns these on?

thanks,

mike h.

You need to activate the corresponding transform "bart" on the datasets you want to use it on. This is not very clear in the docs, we should probably update it.

Add transforms: [bart] either at the root of your config if you want it to be applied on all datasets, or on some specific datasets in your data entries.

Francois:

Thank you. I tried that and now I’m getting this error:

AttributeError: 'BARTNoiseTransform' object has no attribute 'vocabs'

Any ideas?

mike h

cc @Zenglinxiao

Could you please share with me the OpenNMT version(commit) you are working with and the detailed error trace?

Hi Linxiao

I’m running it in a docker image and installed opennmt via pip. Here’s what that shows:

root@a33d70122f86:/mh# pip list | grep nmt
pyonmttok              1.25.0 

I’m running onmt_train from a shell script. It does a bunch of data wrangling, builds the vocab, and then seems to hit the error. That error seems to generate a bunch of additional problems, but I’ve cut them off below.

Mike H.

root@a33d70122f86:/mh# ./both.sh 
ady
gre
ice
ita
khm
lav
mlt_latn
rum
slv
wel_sw
Corpus corpus_1's weight should be given. We default it to 1 for you.
[2021-04-15 13:42:42,294 INFO] Counter vocab from 7000 samples.
[2021-04-15 13:42:42,294 INFO] Build vocab on 7000 transformed examples/corpus.
[2021-04-15 13:42:42,301 INFO] corpus_1's transforms: TransformPipe(BARTNoiseTransform(None))
[2021-04-15 13:42:42,301 INFO] Loading ParallelCorpus(/workspace/big/BIG_src-train.txt, /workspace/big/BIG_tgt-train.txt, align=None)...
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/opt/conda/lib/python3.8/site-packages/onmt/inputters/corpus.py", line 298, in build_sub_vocab
    maybe_example = DatasetAdapter._process(item, is_train=True)
  File "/opt/conda/lib/python3.8/site-packages/onmt/inputters/corpus.py", line 69, in _process
    maybe_example = transform.apply(
  File "/opt/conda/lib/python3.8/site-packages/onmt/transforms/transform.py", line 189, in apply
    example = transform.apply(
  File "/opt/conda/lib/python3.8/site-packages/onmt/transforms/bart.py", line 380, in apply
    if is_train and self.vocabs is not None:
AttributeError: 'BARTNoiseTransform' object has no attribute 'vocabs'
"""

The above exception was the direct cause of the following exception:
...

Hi @hammondm,
Just checked in the code. The error is not from training, but when building the vocab. I’ll commit a PR to fix this issue.
As a temporary workaround, you can remove the bart transform when build_vocab and only add them in the train config, the error would disappear and it won’t affect the result.

Hi Linxiao

Excellent. Yes, that’s working; thanks!

mike h