Hello.
I tried to apply some word level noise in back-translation and ran into some issues, perhaps related to my custom BPE implementation. My config looks like this:
params:
decoding_subword_token: ·
decoding_noise:
- dropout: 0.1
- replacement: [0.1, <unk>]
- permutation: 3
If I just remove the first line (decoding_subword_token
) then everything seems to work, but the noise is applied on the subword level. However I think it makes more sense to do it on the word level, so that re-ordering noise does not generate gibberish out of subwords for example. My BPE is custom and I use this upper period symbol to denote space. Here is for example my tokenised input:
if· you· ˿' re· in· there· ˿,· I· have· to· talk· toy ou· ˿!
The ·
symbol denotes a new word (space) and (˿) denotes that this subword should be concatenated without a space to the previous one. The latter is probably irrelevant. So I pass this symbol in my config and I also modified ONMT-tf to enforce is_spacer=True
which I verified by printing. I had to do this because the current implementation considers everything other than SentencePiece
's spacer to be a joiner instead which is not true in my case. I am not getting this error at inference:
Traceback (most recent call last):
File "/home/estergiadis/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
return fn(*args)
File "/home/estergiadis/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/estergiadis/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Need minval < maxval, got 0 >= -1
[[{{node transformer/map/while/cond/random_uniform}}]]
During handling of the above exception, another exception occurred:
Any ideas?