for l in en de; do for f in data/multi30k/.$l; do if [[ “$f” != “test” ]]; then sed -i “$ d” $f; fi; done; done
for l in en de; do for f in data/multi30k/.$l; do perl tools/tokenizer.perl -a -no-escape -l $l -q < $f > $f.atok; done; done
onmt_preprocess -train_src data/multi30k/train.en.atok -train_tgt data/multi30k/train.de.atok -valid_src data/multi30k/val.en.atok -valid_tgt data/multi30k/val.de.atok -save_data data/multi30k.atok.low -lower
This looks like a bash script.
I’m running it in my PERSONAL openNMT folder with data/multi30k in it.
I can see it’s accessing the .en .de files but it fails with errors:
sed: 1: “data/multi30k/train.en”: extra characters at the end of d command
sed: 1: “data/multi30k/val.en”: extra characters at the end of d command
sed: 1: “data/multi30k/train.de”: extra characters at the end of d command
sed: 1: “data/multi30k/val.de”: extra characters at the end of d command
Can’t open perl script “tools/tokenizer.perl”: No such file or directory
Can’t open perl script “tools/tokenizer.perl”: No such file or directory
Can’t open perl script “tools/tokenizer.perl”: No such file or directory
Can’t open perl script “tools/tokenizer.perl”: No such file or directory
Can’t open perl script “tools/tokenizer.perl”: No such file or directory
Can’t open perl script “tools/tokenizer.perl”: No such file or directory
Why?
Why script errors like extra chars after d?
I have perl.
But why should I mysteriously have a tools/tokenizer.perl structure?
Where do I get that?
Where should I be running this?
Are you on macOS? Your sed error looks very much like this.
For the tokenizer error, make sure you execute this from your OpenNMT-py folder.
Sidenote, this Translation tutorial is not really up to date. To save you some time, you can have a look at the quickstart first, and then at the transformer and read about subword tokenization.
Yes, I’m on Mac OS.
Out of date tutorial? OK.
I need to run in ‘my’ OpenNMT folder? You mean where it’s installed? It’s a mystery to me where it’s installed. I’ll try and find it.
And I should go to Quickstart instead?
OK . .
It’s functional, but the methods and command line may not be the most appropriate now.
I need to run in ‘my’ OpenNMT folder? You mean where it’s installed? It’s a mystery to me where it’s installed. I’ll try and find it.
As you wrote “I’m running it in my PERSONAL openNMT folder with data/multi30k in it.” I thought you had done a git clone. If you installed via pip it may be easier for you to get the perl scripts separately. You can retrieve them here.
You can also use OpenNMT’s Tokenizer for tokenization.
I just want to get it running.
Yes I installed via Pip after a pytorch install via Anaconda.
I looked in that github repo.
It’s unclear to me what I should do next to get a demo running.
I can usually get demos running but openNMT is a few steps beyond me (I’m a python ML coder not a systems administrator or whatever it is you guys are ).
I can’t even work out where openNMT is installed. I’ve looked in all sorts of alien locations llike /usr/bin/local, /anaconda/ . . /lib/ . . etc etc. I foudn the license file and a few other things. NO data folder or perl folder.
Just don’t use this data (hence those sed/tokenizer command). it won’t give you any good result anyways. You’ll need much bigger datasets.
Use the data folder from the repo to try and have the commands from Quickstart running.
I still can’t get oNMT working . .
Python 3.7 installed via Anaconda.
GIt clone works fine.
python setup.py install works fine.
Data folder contains the demo files.
But then Quickstart demo does NOT work,
See ^ for errors.
Looks like there is something broken in your install.
You can try executing the scripts directly, from the directory in which you cloned.
onmt_preprocess --> preprocess.py
onmt_train --> train.py
This very much looks like a macOS specific issue, and probably not specifically related to OpenNMT-py.
By the way, you might have some trouble to get anything running decently on macOS, unless you 1) have an NVIDIA GPU (hence a rather old version of macOS) 2) manage to install the proper drivers, CUDA, etc. 3) compile pytorch for your configuration. I believe there may also be some way to compile pytorch for AMD gpus, but never tried it.
The preprocess script you’re looking at is just a wrapper. The “real” script is imported at the top from onmt.bin.preprocess.
I could load it on my PC.
But all the demos use Linux and bash scripts . .
Do I load Linux on my PC first? An emulator?
Or just load the PC version of openNMT?
OpenNMT-py is mainly developed tested on linux. Easiest would be an ubuntu install on your PC (not a VM, as it would require PCIe passtrough to get the GPU).
Thnx.
It loaded on my PC (on DOS shell, not Linux), installed & the training demo is running!
(But I had to use the direct python X.py -xxxx version of commands)
My laptop probably has no GPU so the training will take . . 172 hrs!