POS tag + BPE/SentencePiece

Hello,

Is it possible to do both: POS tagging (as a feature) and BPE/SentencePiece?

If yes. How do we do it? And does it yield positive results?

My understanding is that your feature file needs to have the same number of tokens as the training file. Yet the training file words are broken down in smaller token with BPE/SentencePiece. So at first glance, it seem impossible to do both at the sametime… except if we repeat the POS tag on each part of the word?

I’m not even sure that would be usefull… but if anyone as insigh regarding if it’s doable and how to do it, it would be really appreciated.

Best regards,
Samuel

I actually came accros the answer by accident today…

if anyone wondered what the answer was!

1 Like

Hi Samuel

I have heard that using POS helps, but I am not so sure if it makes a real diference. Also does not look easy to setup this.

But I am answering you because unless I am wrong, recent opentnmt versions were having some problems using several features as an input, so before dive in, would suggest you to verify multiple features input are working fine. You can search “features” in the forum.

Have a nice day!
Miguel

1 Like