Combining BPE and factored model

vincent · August 29, 2017, 3:08pm

Is there a way to combine using the BPE (byte pair encoding) with a factored model, for instance with word|lemma|pos combination?

jean.senellart · August 31, 2017, 5:41pm

You can use features with BPE model - and in that case, you can repeat the “factors” (what we call features) for each subtoken:

for instance for the sentence:

How￨WRB can￨MD I￨PRP encode￨VB an￨DT audio￨JJ file￨NN ?￨?

will become through BPE and case markup:

how￨C￨WRB can￨L￨MD i￨C￨PRP en￭￨L￨VB code￨L￨VB an￨L￨DT audio￨L￨JJ file￨L￨NN ?￨N￨?

vincent · September 1, 2017, 7:27am

thank you – I’ll try it as soon as my gpus are free

dimitarsh1 · April 16, 2021, 12:26pm

Since this thread is from 2017, I was wondering if factored NMT is still supported in the current version of OpenNMT (as of April 2021)?

Thanks.