What are the basic settings for adding linguistic features

jafr · November 13, 2021, 1:15pm

Hello
What are the basic settings (in config files) to add linguistic features when building vocab ?
I tried the following setting but only vocab.src and vocab.tgt were generated.
In particular, I do not see any ‘vocab/vocab.f0’
Is it normal ?
(I use opennmt-py 2.0)

Thank you

jafr

data:
train:
path_src: train.src
path_tgt: train.tgt
src_feats:
feat_0: train.f0

valid:
    path_src: val.src
    path_tgt: val.tgt
    src_feats:
        feat_0: val.f0

n_sample: -1

save_data: vocab

src_vocab: vocab/vocab.src
tgt_vocab: vocab/vocab.tgt
src_feats_vocab:
feat_0: vocab/vocab.f0
feat_merge: “sum”

anderleich · November 15, 2021, 8:52am

Hi @jafr ,

It seems a required configuration parameter is missing in the docs. You need to set apply_terminology: true in order to use linguistic features.

jafr · November 15, 2021, 9:56am

Thanks for the suggestion.
I added this line but nothing appends
I only obtain the files vocab.src and vocab.tgt.
I do not obtain ‘vocab.f0’.
I suppose this is not normal (?)

jafr · November 15, 2021, 12:49pm

I mean “nothing happen”…

anderleich · November 18, 2021, 3:43pm

This could be an example:

data:
    dataset:
        path_src: data.src
        path_tgt: data.tgt
        src_feats:
            feat_0: data.src.feats
            feat_1: data.src.feats.1
        transforms: [filterfeats, onmt_tokenize, inferfeats, filtertoolong]
apply_terminology: true
reversible_tokenization: "joiner"

Do not forget about the FilterFeats and Inferfeats transforms

anderleich · November 19, 2021, 4:26pm

HI @jafr,

I’ve just realized that the apply_terminology: true flag is not needed anymore. I guess your issue comes from not setting the transforms mentioned above.

Sorry for this