Whenever i run the train command noted in step 3, i get an error claiming that the data sample is not generating samples. Data samples are named:
src-train tgt-train.txt
Here is the complete traceback:
2021-09-16 16:15:50Z.000704 [beat_service] WARNING start_beat_service: CALLBACK_URL or task_id is unset; beat service will be disabled
2021-09-16 16:15:50Z.000704 [utility] INFO run: Starting executing utility NMT framework=?
2021-09-16 16:15:50Z.000704 [framework] INFO train_wrapper: Starting training model my_model_1
2021-09-16 16:15:50Z.000705 [preprocess] INFO generate_preprocessed_data: Generating data to /root/workspace/data/preprocess
2021-09-16 16:15:50Z.000705 [sampler] WARNING sample: No ‘sample’ size specified in configuration,all data will be sampled.
Traceback (most recent call last):
File “entrypoint.py”, line 149, in
OpenNMTPYFramework().run()
File “/root/nmtwizard/utility.py”, line 209, in run
stats = self.exec_function(args)
File “/root/nmtwizard/framework.py”, line 322, in exec_function
push_model=not self._no_push)
File “/root/nmtwizard/framework.py”, line 420, in train_wrapper
self._build_data(local_config))
File “/root/nmtwizard/framework.py”, line 975, in _build_data
raise RuntimeError(‘data sampling generated 0 sentences’)
RuntimeError: data sampling generated 0 sentences
My directory structure is exactly as noted in the example
To be fully transparent here, the OpenNMT-py framework in the nmt-wizard-docker project is not well tested. I suggest using OpenNMT-py directly unless you really care about one feature listed in the nmt-wizard-docker README.
Good to know, yes i have the OpenNMT-py framework working on my system as well however I am seeing if I can get nmt-wizard-docker working. When I change the name to the correct syntax (train.en) I still get the same error.
To anyone who is looking to get opennmt operational on docker, i suggest downloading the opennmt official image (at least I think its the official image):
It is very easy to run OpenNMT-py in a Docker container. You can run the image pytorch/pytorch:1.9.0-cuda11.1-cudnn8-runtime and then just install OpenNMT-py inside the container using: