OpenNMT Forum

Tokenization with POS tagging


(Md Asadul Islam) #1

I can see it is possible to use the tokenizer and POS tagget by using hook.

I am trying to use BPE along with it but I understood that it’s doing BPE first then POS tagging later on. Is it possible to use POS tagging before BPE.

For the following sentence
they demandedd a retrial.
they_PP dem__VVZ an__NP ded__NP d_SYM a_DT re__NP trial_NN _._SENT

But I am trying to get something like
they_PP dem__VVZ an__VVZ ded__VVZ d_VVZ a_DT re__NP trial_NP _._SENT
as demandedd is tagged as VVZ. I want all of the sub token should be tagged as VVZ
dem__VVZ an__VVZ ded__VVZ d_VVZ

Thanks in advance