I’m trying to use openmt-py with sentencepiece. Sentencepiece runs fine but I get an error when openmt tries to save its vocab file after running transforms.
Based on the code where it’s breaking and the error message it seems that src_vocab
value from the config file is an empty string even though I set it in the config file.
config.yml
# Based on https://opennmt.net/OpenNMT-py/examples/Translation.html
## Where the samples will be written
save_data: openmt-data
## Where the vocab(s) will be written
src_vocab: openmt.vocab.src
tgt_vocab: openmt.vocab.tgt
# Corpus opts:
data:
corpus_1:
path_src: split_data/src-train.txt
path_tgt: split_data/tgt-train.txt
valid:
path_src: split_data/src-val.txt
path_tgt: split_data/tgt-val.txt
### Transform related opts:
#### Subword
src_subword_model: sentencepiece.model
tgt_subword_model: sentencepiece.model
src_subword_nbest: 1
src_subword_alpha: 0.0
tgt_subword_nbest: 1
tgt_subword_alpha: 0.0
#### Filter
src_seq_length: 150
tgt_seq_length: 150
# silently ignore empty lines in the data
skip_empty_level: silent
...
spm_train --input=split_data/all.txt --model_prefix=sentencepiece \
--vocab_size=$vocab_size --character_coverage=$character_coverage\
--input_sentence_size=1000000 --shuffle_input_sentence=true
onmt_build_vocab -config config.yml -n_sample -1
Logs:
trainer_interface.cc(604) LOG(INFO) Saving model: sentencepiece.model
trainer_interface.cc(615) LOG(INFO) Saving vocabs: sentencepiece.vocab
Corpus corpus_1's weight should be given. We default it to 1 for you.
[2021-01-23 14:25:31,286 INFO] Counter vocab from -1 samples.
[2021-01-23 14:25:31,286 INFO] n_sample=-1: Build vocab on full datasets.
[2021-01-23 14:25:31,295 INFO] corpus_1's transforms: TransformPipe()
[2021-01-23 14:25:31,295 INFO] Loading ParallelCorpus(split_data/src-train.txt, split_data/tgt-train.txt, align=None)...
[2021-01-23 14:26:22,880 INFO] Counters src:977077
[2021-01-23 14:26:22,880 INFO] Counters tgt:3370920
Traceback (most recent call last):
File "/usr/local/bin/onmt_build_vocab", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.8/dist-packages/onmt/bin/build_vocab.py", line 66, in main
build_vocab_main(opts)
File "/usr/local/lib/python3.8/dist-packages/onmt/bin/build_vocab.py", line 53, in build_vocab_main
save_counter(src_counter, opts.src_vocab)
File "/usr/local/lib/python3.8/dist-packages/onmt/bin/build_vocab.py", line 42, in save_counter
check_path(save_path, exist_ok=opts.overwrite, log=logger.warning)
File "/usr/local/lib/python3.8/dist-packages/onmt/utils/misc.py", line 19, in check_path
os.makedirs(os.path.dirname(path), exist_ok=True)
File "/usr/lib/python3.8/os.py", line 223, in makedirs
mkdir(name, mode)
FileNotFoundError: [Errno 2] No such file or directory: ''