Tutorial for OpenNMT-py using Colab

park · November 18, 2019, 3:57am

Yes I already Fix that problem !

mbhoshen · January 20, 2020, 9:38am

I am working on your tutorial. It looks very useful but I seem to be getting exactly the same issue of : File “/content/OpenNMT-py/onmt/opts.py”, line 4, in
import configargparse
ModuleNotFoundError: No module named ‘configargparse’

I understood you solved the issue. Have I worked somehow on an old version?
Thanks

park · January 20, 2020, 9:55am

Here is the link

anavc94 · January 24, 2020, 12:34pm

Hello,

First of all, thank you for the notebook, it’s really helpful. I have seen that you start using BPE [Subword Tokenization], but I wonder if is it necessary to use a tokenizer before BPE to split all tokens by space. Does BPE make this? Or are you supposing the input text to BPE is already tokenized on this way?

Thank you so much

ishaansharma · January 26, 2020, 6:04pm

No, Speaking in a very high level language, BPE just compress your text and help you to reduce the computation .
you will have to clean you text data and remove all the noise and perform all the preprocessing on the text , viz lemmatisation and removing stop words and all those task that you perform on before moving forward to the next state in the pipeline.
So yes you will have to tokenize your text, but this all depends on different tasks.

tel34 · January 27, 2020, 6:42pm

Take a look at SentencePiece at https://github.com/google/sentencepiece. You can tokenize and apply BPE at the same time.

anavc94 · February 4, 2020, 10:29am

Hi,

thank you so much for the explanation. I will start with those steps.

Regards,
Ana

anavc94 · February 4, 2020, 10:30am

Hi,

thanks for the suggestion. I will also take a look at SentencePiece.

Regards,
Ana

vincent · October 27, 2020, 10:28am

Hi,
Is there by any chance a new version of this tutorial? Because, when I try it now (with openNMT version 1.2) there is no preprocess.py script anymore in the download.
thanks,
v.

francoishernandez · October 27, 2020, 10:41am

Not sure @park has done an updated version for v2 yet.
In the meantime, you can checkout the legacy branch if you want to use this tutorial.

vincent · October 27, 2020, 1:32pm

Thanks, if I change the clone github command into:
!git clone -b legacy https://github.com/OpenNMT/OpenNMT-py
it works fine.

eTranslate · November 15, 2020, 4:58am

Hi Paul, Is it possible for you to make a video explaining how to host open nmt models

debartha-saha · November 27, 2020, 7:37am

Hi, Thanks a lot, this solved my problem.
For detokenization using:
<sed -i “s/@@ //g” OpenNMT-py/data/pred.txt>
It gave a syntax error. Do you have a solution for this?

vincent · November 27, 2020, 8:01am

!sed "s/@@ //g" works fine, so no fish hook brackets like < >, but an exclamation mark ! to run a linux command

lcaei · December 5, 2020, 9:10am

Hi
I’ve trained and hosted my model but it’s produces very poor predictions and no sentence translates right, I used the default tokenizer that is learn bpe and apply bpe on the sentences but in the configuration of the json file, I used a sentencepiece pretrained model downloaded. I used sentencepiece because I’m using windows and pyonmttok does not work on windows.

debartha-saha · December 18, 2020, 3:11am

Traceback (most recent call last):
File “/usr/lib/python3.6/multiprocessing/pool.py”, line 119, in worker
result = (True, func(*args, **kwds))
File “/content/OpenNMT-py/onmt/bin/preprocess.py”, line 54, in process_one_shard
assert len(src_shard) == len(tgt_shard)
AssertionError

…
I am getting the above error while running preprocess.py on my dataset.
I am applying bpe before preprocessing.
please, help me to solve the problem.

Kenneth · January 18, 2021, 1:06pm

Good day! Do we have an updated github repo for this tutorial? One where the previously mentioned errors no longer exist (preprocess.py not found, functions not found, etc.)

Kenneth · January 19, 2021, 8:27am

I also found out that the preprocess.py file is actually empty; there is no code inside.

vincent · January 26, 2021, 1:36pm

Here is a new google colab notebook for OpenNMT.py 2.x

Shikhar-S · February 8, 2021, 7:00am

This notebook helps with Cuda 10.1 incompatibility on colab. Thanks!