Seq_length and vocab_size no change

abas · February 10, 2021, 5:17pm

INPUT:
!python OpenNMT-py/preprocess.py -src_seq_length 80 -tgt_seq_length 80 -src_vocab_size 30000 -tgt_vocab_size 30000 -lower -share_vocab .

-when i run this command vocab-size and seq_lenght no change .

-share_vocab it is merging src and tgt vocab, What is its role and can it be dispensed with.

-lower , what can do?

OUTPUT:
[2021-02-10 09:58:59,525 INFO] * tgt vocab size: 10207.
[2021-02-10 09:58:59,540 INFO] * src vocab size: 10287.
[2021-02-10 09:58:59,540 INFO] * merging src and tgt vocab…
[2021-02-10 09:58:59,578 INFO] * merged vocab size: 17928.

best regards
Abas.

ymoslem · February 13, 2021, 11:19am

Dear Abas,

It is clear you are using a previous version. It is difficult to get support with that. Why do not you try to upgrade? There is a notebook for version 2.x.

When you do, you will be able to see all the relevant arguments of Build Vocab (used to be called “preprocess”), train, translate, etc.

This is useful when the two languages: 1) have the same script, and 2) share vocabulary. For example, they might be useful for English and Spanish, or Arabic and Persian, etc. They are not very useful for languages that do not follow one of the two previously mentioned rules, like Hindi and Urdu, or Irish (Gaeilge) and German.

I hope this helps!

Kind regards,
Yasmin

abas · February 14, 2021, 1:16pm

Hi Yasmin;
yeah it is helpful.

so i’m using notebook for park for the previous version OpenNMT-py 1.x. I want to switch to OpenNMT-py 2.x And keep working on same notebook(from park) so i need just installing (pip3 install git+https://github.com/OpenNMT/OpenNMT-py.git).

and about python OpenNMT-py/preprocess.py, train, translate. turn to
onmt_build_vocab
onmt_train
onmt_translate
or
python preprocess.py
python train.py
python translate.py
i’m not working on YAML configuration file.

Am I correct ?
Wait for some clarification from you.

best regards
Abas.