RuntimeError: data sampling generated 0 sentences - nmt-wizard docker

dougd920 · September 18, 2021, 8:33pm

After following the steps on this github:

Whenever i run the train command noted in step 3, i get an error claiming that the data sample is not generating samples. Data samples are named:

src-train tgt-train.txt

Here is the complete traceback:

2021-09-16 16:15:50Z.000704 [beat_service] WARNING start_beat_service: CALLBACK_URL or task_id is unset; beat service will be disabled
2021-09-16 16:15:50Z.000704 [utility] INFO run: Starting executing utility NMT framework=?
2021-09-16 16:15:50Z.000704 [framework] INFO train_wrapper: Starting training model my_model_1
2021-09-16 16:15:50Z.000705 [preprocess] INFO generate_preprocessed_data: Generating data to /root/workspace/data/preprocess
2021-09-16 16:15:50Z.000705 [sampler] WARNING sample: No ‘sample’ size specified in configuration,all data will be sampled.
Traceback (most recent call last):
File “entrypoint.py”, line 149, in
OpenNMTPYFramework().run()
File “/root/nmtwizard/utility.py”, line 209, in run
stats = self.exec_function(args)
File “/root/nmtwizard/framework.py”, line 322, in exec_function
push_model=not self._no_push)
File “/root/nmtwizard/framework.py”, line 420, in train_wrapper
self._build_data(local_config))
File “/root/nmtwizard/framework.py”, line 975, in _build_data
raise RuntimeError(‘data sampling generated 0 sentences’)
RuntimeError: data sampling generated 0 sentences

My directory structure is exactly as noted in the example

Here is my configuration.json file:

dougd920 · September 18, 2021, 8:38pm

Directory structure in screencap form

guillaumekln · September 20, 2021, 7:20am

Training files should end with the language name, e.g train.en and train.de.

guillaumekln · September 20, 2021, 7:30am

To be fully transparent here, the OpenNMT-py framework in the nmt-wizard-docker project is not well tested. I suggest using OpenNMT-py directly unless you really care about one feature listed in the nmt-wizard-docker README.

dougd920 · September 20, 2021, 5:36pm

Good to know, yes i have the OpenNMT-py framework working on my system as well however I am seeing if I can get nmt-wizard-docker working. When I change the name to the correct syntax (train.en) I still get the same error.

dougd920 · September 27, 2021, 5:17pm

Would it be simpler to build my own docker image with opennmt-py running on that?

guillaumekln · September 27, 2021, 6:29pm

If it’s just about using Docker, then sure, you can define your own Dockerfile. There is no need to use another project.

dougd920 · September 29, 2021, 8:50pm

To anyone who is looking to get opennmt operational on docker, i suggest downloading the opennmt official image (at least I think its the official image):

https://opennmt.net/OpenNMT/installation/#docker

For the time being, this is the solution i am going with for dockerized opennmt. You can use docker cp to move files into the container.

guillaumekln · September 30, 2021, 8:00am

It is very easy to run OpenNMT-py in a Docker container. You can run the image pytorch/pytorch:1.9.0-cuda11.1-cudnn8-runtime and then just install OpenNMT-py inside the container using:

pip install OpenNMT-py