Hello!
I have trained my OpenNMT-py model. I used tokenization calling python OpenNMT-py/tools/learn_bpe.py
and doing so quite blindly, I must confess.
Now i’m faced with the task of running a REST server as described in this tutorial:
The issue is, i don’t know which parameters for the tokenization do I pass to the config (in an availible models dir). What type did I use by default and do i need to provide any more arguments beside type and path to “src.code”?
I did just that and it… did not crash, but the capabilities of the translator somehow diminished. It returns empty strings or single dashes very often now, almost for every query. There is definitely a difference though.
And yes, the delimiter is always "@@ " - two “ats” followed by a whitespace:
“dissection” > “dis@@ section”
They are the same. Thanks! I should have checked myself. BPE works in order with that config.
Although I ran a few more tests and the performance is far better without bpe across the board…