RuntimeError: data sampling generated 0 sentences - nmt-wizard docker

After following the steps on this github:

Whenever i run the train command noted in step 3, i get an error claiming that the data sample is not generating samples. Data samples are named:

src-train tgt-train.txt

Here is the complete traceback:

2021-09-16 16:15:50Z.000704 [beat_service] WARNING start_beat_service: CALLBACK_URL or task_id is unset; beat service will be disabled
2021-09-16 16:15:50Z.000704 [utility] INFO run: Starting executing utility NMT framework=?
2021-09-16 16:15:50Z.000704 [framework] INFO train_wrapper: Starting training model my_model_1
2021-09-16 16:15:50Z.000705 [preprocess] INFO generate_preprocessed_data: Generating data to /root/workspace/data/preprocess
2021-09-16 16:15:50Z.000705 [sampler] WARNING sample: No ‘sample’ size specified in configuration,all data will be sampled.
Traceback (most recent call last):
File “entrypoint.py”, line 149, in
OpenNMTPYFramework().run()
File “/root/nmtwizard/utility.py”, line 209, in run
stats = self.exec_function(args)
File “/root/nmtwizard/framework.py”, line 322, in exec_function
push_model=not self._no_push)
File “/root/nmtwizard/framework.py”, line 420, in train_wrapper
self._build_data(local_config))
File “/root/nmtwizard/framework.py”, line 975, in _build_data
raise RuntimeError(‘data sampling generated 0 sentences’)
RuntimeError: data sampling generated 0 sentences

My directory structure is exactly as noted in the example

Here is my configuration.json file:


Directory structure in screencap form

Training files should end with the language name, e.g train.en and train.de.

To be fully transparent here, the OpenNMT-py framework in the nmt-wizard-docker project is not well tested. I suggest using OpenNMT-py directly unless you really care about one feature listed in the nmt-wizard-docker README.

Good to know, yes i have the OpenNMT-py framework working on my system as well however I am seeing if I can get nmt-wizard-docker working. When I change the name to the correct syntax (train.en) I still get the same error.

Would it be simpler to build my own docker image with opennmt-py running on that?

If it’s just about using Docker, then sure, you can define your own Dockerfile. There is no need to use another project.

To anyone who is looking to get opennmt operational on docker, i suggest downloading the opennmt official image (at least I think its the official image):

https://opennmt.net/OpenNMT/installation/#docker

For the time being, this is the solution i am going with for dockerized opennmt. You can use docker cp to move files into the container.

It is very easy to run OpenNMT-py in a Docker container. You can run the image pytorch/pytorch:1.9.0-cuda11.1-cudnn8-runtime and then just install OpenNMT-py inside the container using:

pip install OpenNMT-py