Hi, I wanted to use your this tutorial in a class on MT, but unfortunately things seem to be broken now.
There seems to be no more requirements.txt file in the git.
ERROR: Could not open requirements file: [Errno 2] No such file or directory: âOpenNMT-py/requirements.txtâ
I see a requirements.opt file, but installing this file raises some errors.
BPE processing works fine, but when preprocessing, I get:
ModuleNotFoundError: No module named âconfigargparseâ
I am working on your tutorial. It looks very useful but I seem to be getting exactly the same issue of : File â/content/OpenNMT-py/onmt/opts.pyâ, line 4, in
import configargparse
ModuleNotFoundError: No module named âconfigargparseâ
I understood you solved the issue. Have I worked somehow on an old version?
Thanks
First of all, thank you for the notebook, itâs really helpful. I have seen that you start using BPE [Subword Tokenization], but I wonder if is it necessary to use a tokenizer before BPE to split all tokens by space. Does BPE make this? Or are you supposing the input text to BPE is already tokenized on this way?
No, Speaking in a very high level language, BPE just compress your text and help you to reduce the computation .
you will have to clean you text data and remove all the noise and perform all the preprocessing on the text , viz lemmatisation and removing stop words and all those task that you perform on before moving forward to the next state in the pipeline.
So yes you will have to tokenize your text, but this all depends on different tasks.
Hi,
Is there by any chance a new version of this tutorial? Because, when I try it now (with openNMT version 1.2) there is no preprocess.py script anymore in the download.
thanks,
v.
Hi, Thanks a lot, this solved my problem.
For detokenization using:
<sed -i âs/@@ //gâ OpenNMT-py/data/pred.txt>
It gave a syntax error. Do you have a solution for this?