Is there a way to combine using the BPE (byte pair encoding) with a factored model, for instance with word|lemma|pos combination?
You can use features with BPE model - and in that case, you can repeat the “factors” (what we call features) for each subtoken:
for instance for the sentence:
How￨WRB can￨MD I￨PRP encode￨VB an￨DT audio￨JJ file￨NN ?￨?
will become through BPE and case markup:
how￨C￨WRB can￨L￨MD i￨C￨PRP en￭￨L￨VB code￨L￨VB an￨L￨DT audio￨L￨JJ file￨L￨NN ?￨N￨?
thank you – I’ll try it as soon as my gpus are free
Since this thread is from 2017, I was wondering if factored NMT is still supported in the current version of OpenNMT (as of April 2021)?