I have some issues using Glove word embeddings.I generated the embeddings by following the author’s scripts and I know that they need to be torch serialized tensors if I want to use them in the train part. I found some tutorials in this forum where they use preprocess.py to generate the vocabulary file and then use embeddings_to_torch.py to produce the tensors. My problem is that I couldn’t find preprocess.py in the github repository (https://github.com/OpenNMT/OpenNMT-py) and I have no idea how to change the embeddings into tensors. Can someone help me to understand how to do this?
So how can I create the tensors if preprocess.py doesn’t exist anymore?
It’s done on the fly here: https://github.com/OpenNMT/OpenNMT-py/blob/fa34132067aeb50339843d1a08f79e6597da3e32/onmt/bin/train.py#L37
Can you explain to me how to perform this step by step? I have my word embeddings in a .txt file, what do I do from here? Sorry for the stupid question, but I’m very new to this.
The steps are documented in the link I already provided: https://opennmt.net/OpenNMT-py/FAQ.html
I managed to download OpenNMT-py 2.0, but I ran into another problem. This is the trace:
Traceback (most recent call last):
File “/…/…/envNMT/bin/onmt_train”, line 8, in
File “/…/…/envNMT/lib/python3.7/site-packages/onmt/bin/train.py”, line 169, in main
File “/…/…/envNMT/lib/python3.7/site-packages/onmt/bin/train.py”, line 103, in train
checkpoint, fields, transforms_cls = _init_train(opt)
File “/…/…/envNMT/lib/python3.7/site-packages/onmt/bin/train.py”, line 58, in _init_train
File “/…/…/envNMT/lib/python3.7/site-packages/onmt/utils/parse.py”, line 131, in validate_prepare_opts
File “/…/…/envNMT/lib/python3.7/site-packages/onmt/utils/parse.py”, line 29, in _validate_data
for cname, corpus in corpora.items():
AttributeError: ‘str’ object has no attribute ‘items’
Can someone help me to understand what went wrong?
You need to make a valid yaml configuration, as shown in the docs: https://opennmt.net/OpenNMT-py/quickstart.html#step-1-prepare-the-data