Tutorial for OpenNMT-py using Colab

Hello Everyone I made an OpenNMT Pytorch Basic Tutorial Using Colab
Here is the link

Installation, preprocessing, basic training, Transformer training, and translation use by colab
I hope you will be helpful to those who are new to it.

If you have any question please email me


Thank you, this is really helpful!

Proceed with reinforcement of contents.
Add Theory and BPE

Hi, I wanted to use your this tutorial in a class on MT, but unfortunately things seem to be broken now.
There seems to be no more requirements.txt file in the git.

ERROR: Could not open requirements file: [Errno 2] No such file or directory: ‘OpenNMT-py/requirements.txt’

I see a requirements.opt file, but installing this file raises some errors.

BPE processing works fine, but when preprocessing, I get:
ModuleNotFoundError: No module named ‘configargparse’

I hope there is an easy fix

Looks like @park made the required change 7 days ago. You might need to get the latest version of his notebook.

thank you very much – now that is an easy fix :grinning:

Yes I already Fix that problem !

I am working on your tutorial. It looks very useful but I seem to be getting exactly the same issue of : File “/content/OpenNMT-py/onmt/”, line 4, in
import configargparse
ModuleNotFoundError: No module named ‘configargparse’

I understood you solved the issue. Have I worked somehow on an old version?

Here is the link


First of all, thank you for the notebook, it’s really helpful. I have seen that you start using BPE [Subword Tokenization], but I wonder if is it necessary to use a tokenizer before BPE to split all tokens by space. Does BPE make this? Or are you supposing the input text to BPE is already tokenized on this way?

Thank you so much

No, Speaking in a very high level language, BPE just compress your text and help you to reduce the computation .
you will have to clean you text data and remove all the noise and perform all the preprocessing on the text , viz lemmatisation and removing stop words and all those task that you perform on before moving forward to the next state in the pipeline.
So yes you will have to tokenize your text, but this all depends on different tasks.

Take a look at SentencePiece at You can tokenize and apply BPE at the same time.


thank you so much for the explanation. I will start with those steps.



thanks for the suggestion. I will also take a look at SentencePiece.


Is there by any chance a new version of this tutorial? Because, when I try it now (with openNMT version 1.2) there is no script anymore in the download.

Not sure @park has done an updated version for v2 yet.
In the meantime, you can checkout the legacy branch if you want to use this tutorial.

Thanks, if I change the clone github command into:
!git clone -b legacy
it works fine.


Hi Paul, Is it possible for you to make a video explaining how to host open nmt models

Hi, Thanks a lot, this solved my problem.
For detokenization using:
<sed -i “s/@@ //g” OpenNMT-py/data/pred.txt>
It gave a syntax error. Do you have a solution for this?