Hi
I am running some tests in opennmt-py. I have a set of src/tgr files with case feature/joiner/aggresive that have been able to be processed in opennmt. I can run preprocess and train without any error.
The problem arises when I want to translate it as I get an assertion error. Runing without case feature is fine.
I have seen several posts in the forum but i have not been able to find a solution or are just unanswered.
Reviewing the stack looks like there is some torch file culprit. The system is pretty new with opennmt-py installed just a couple of weeks ago with ubuntu 18.10 and as far as I recall pytorch build Stable 1.0, python 3.7.1 and Cuda 10.0 (if i run: “conda install pytorch torchvision cudatoolkit=10.0 -c pytorch” I get a “# All requested packages already installed.”)
nvidia-smi reports 418.56 Cuda version 10.1. I have 2 8Mb GTX1070.
ptython replies with "Python 3.7.1 (default, Dec 14 2018, 19:28:38) "
As I am not a linux expert, take this with care!
I have seen in docker a openmt/opennmt-py image but looks really old. Not sure If I can use it with my current nvida-docker that I use for the docker opennmt image (by the way, thanks to provide it!)
So I run:
thpython /home/laika/OpenNMT-py/preprocess.py -train_src src.atokCJA -train_tgt tgt.atokCJA \
-valid_src src_tunning.atokCJA -valid_tgt tgt_tunning.atokCJA \
-save_data CAES_CJA.data
[2019-04-13 14:06:33,121 INFO] Extracting features...
[2019-04-13 14:06:33,121 INFO] * number of source features: 1.
[2019-04-13 14:06:33,121 INFO] * number of target features: 1.
[2019-04-13 14:06:33,121 INFO] Building `Fields` object...
[2019-04-13 14:06:33,121 INFO] Building & saving training data...
[2019-04-13 14:06:33,122 INFO] Reading source and target files: src.atokCJA tgt.atokCJA.
[2019-04-13 14:06:33,173 INFO] Building shard 0.
[2019-04-13 14:06:38,813 INFO] * saving 0th train data shard to CAES_CJA.data.train.0.pt.
[2019-04-13 14:06:43,956 INFO] Building & saving validation data...
[2019-04-13 14:06:43,956 INFO] Reading source and target files: src_tunning.atokCJA tgt_tunning.atokCJA.
[2019-04-13 14:06:43,958 INFO] Building shard 0.
[2019-04-13 14:06:44,196 INFO] * saving 0th valid data shard to CAES_CJA.data.valid.0.pt.
[2019-04-13 14:06:44,556 INFO] Building & saving vocabulary...
[2019-04-13 14:06:45,748 INFO] * reloading CAES_CJA.data.train.0.pt.
[2019-04-13 14:06:47,330 INFO] * tgt vocab size: 40442.
[2019-04-13 14:06:47,330 INFO] * tgt_feat_0 vocab size: 9.
[2019-04-13 14:06:47,387 INFO] * src vocab size: 36771.
[2019-04-13 14:06:47,387 INFO] * src_feat_0 vocab size: 7.
Then (export CUDA_VISIBLE_DEVICES=0):
python /home/laika/OpenNMT-py/train.py -data CAES_CJA.data -save_model CAES_CJA.data.model -world_size 1 -gpu_ranks 0
[2019-04-13 14:10:26,868 INFO] * src vocab size = 36771
[2019-04-13 14:10:26,868 INFO] * src_feat_0 vocab size = 7
[2019-04-13 14:10:26,868 INFO] * tgt vocab size = 40442
[2019-04-13 14:10:26,868 INFO] * tgt_feat_0 vocab size = 9
[2019-04-13 14:10:26,868 INFO] Building model...
[2019-04-13 14:10:29,683 INFO] NMTModel(
(encoder): RNNEncoder(
(embeddings): Embeddings(
(make_embedding): Sequential(
(emb_luts): Elementwise(
(0): Embedding(36771, 500, padding_idx=1)
(1): Embedding(7, 3, padding_idx=1)
)
)
)
(rnn): LSTM(503, 500, num_layers=2, dropout=0.3)
)
(decoder): InputFeedRNNDecoder(
(embeddings): Embeddings(
(make_embedding): Sequential(
(emb_luts): Elementwise(
(0): Embedding(40442, 500, padding_idx=1)
(1): Embedding(9, 4, padding_idx=1)
)
)
)
(dropout): Dropout(p=0.3)
(rnn): StackedLSTM(
(dropout): Dropout(p=0.3)
(layers): ModuleList(
(0): LSTMCell(1004, 500)
(1): LSTMCell(500, 500)
)
)
(attn): GlobalAttention(
(linear_in): Linear(in_features=500, out_features=500, bias=False)
(linear_out): Linear(in_features=1000, out_features=500, bias=False)
)
)
(generator): Sequential(
(0): Linear(in_features=500, out_features=40442, bias=True)
(1): Cast()
(2): LogSoftmax()
)
)
[2019-04-13 14:10:29,684 INFO] encoder: 22399521
[2019-04-13 14:10:29,684 INFO] decoder: 46248478
[2019-04-13 14:10:29,684 INFO] * number of parameters: 68647999
[2019-04-13 14:10:29,685 INFO] Starting training on GPU: [0]
[2019-04-13 14:10:29,685 INFO] Start training loop and validate every 10000 steps...
[2019-04-13 14:10:30,991 INFO] Loading dataset from CAES_CJA.data.train.0.pt, number of examples: 85656
[2019-04-13 14:10:37,727 INFO] Step 50/100000; acc: 4.56; ppl: 560408.95; xent: 13.24; lr: 1.00000; 9100/8965 tok/s; 8 sec
.
.
.
Then:
python /home/laika/OpenNMT-py/translate.py -model CAES_CJA.data.model_step_10000.pt -src src_verify.atokCJA -output output.txt -replace_unk -verbose -gpu 0
[2019-04-13 14:46:46,807 INFO] Translating shard 0.
Traceback (most recent call last):
File "/home/laika/OpenNMT-py/translate.py", line 48, in <module>
main(opt)
File "/home/laika/OpenNMT-py/translate.py", line 32, in main
attn_debug=opt.attn_debug
File "/home/laika/OpenNMT-py/onmt/translate/translator.py", line 322, in translate
batch, data.src_vocabs, attn_debug
File "/home/laika/OpenNMT-py/onmt/translate/translator.py", line 511, in translate_batch
return_attention=attn_debug or self.replace_unk)
File "/home/laika/OpenNMT-py/onmt/translate/translator.py", line 658, in _translate_batch
batch_offset=beam._batch_offset)
File "/home/laika/OpenNMT-py/onmt/translate/translator.py", line 549, in _decode_and_generate
decoder_in, memory_bank, memory_lengths=memory_lengths, step=step
File "/home/laika/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/laika/OpenNMT-py/onmt/decoders/decoder.py", line 212, in forward
tgt, memory_bank, memory_lengths=memory_lengths)
File "/home/laika/OpenNMT-py/onmt/decoders/decoder.py", line 374, in _run_forward_pass
emb = self.embeddings(tgt)
File "/home/laika/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/laika/OpenNMT-py/onmt/modules/embeddings.py", line 245, in forward
source = self.make_embedding(source)
File "/home/laika/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/laika/anaconda3/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/laika/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/laika/OpenNMT-py/onmt/modules/util_class.py", line 25, in forward
assert len(self) == len(inputs_)
AssertionError
If will really appreciate any help!
have a nice day!
Miguel