!python OpenNMT-py/preprocess.py -src_seq_length 80 -tgt_seq_length 80 -src_vocab_size 30000 -tgt_vocab_size 30000 -lower -share_vocab .
-when i run this command vocab-size and seq_lenght no change .
-share_vocab it is merging src and tgt vocab, What is its role and can it be dispensed with.
-lower , what can do?
[2021-02-10 09:58:59,525 INFO] * tgt vocab size: 10207.
[2021-02-10 09:58:59,540 INFO] * src vocab size: 10287.
[2021-02-10 09:58:59,540 INFO] * merging src and tgt vocab…
[2021-02-10 09:58:59,578 INFO] * merged vocab size: 17928.
It is clear you are using a previous version. It is difficult to get support with that. Why do not you try to upgrade? There is a notebook for version 2.x.
When you do, you will be able to see all the relevant arguments of Build Vocab (used to be called “preprocess”), train, translate, etc.
This is useful when the two languages: 1) have the same script, and 2) share vocabulary. For example, they might be useful for English and Spanish, or Arabic and Persian, etc. They are not very useful for languages that do not follow one of the two previously mentioned rules, like Hindi and Urdu, or Irish (Gaeilge) and German.
I hope this helps!
yeah it is helpful.
so i’m using notebook for park for the previous version OpenNMT-py 1.x. I want to switch to OpenNMT-py 2.x And keep working on same notebook(from park) so i need just installing (pip3 install git+https://github.com/OpenNMT/OpenNMT-py.git).
and about python OpenNMT-py/preprocess.py, train, translate. turn to
i’m not working on YAML configuration file.
Am I correct ?
Wait for some clarification from you.