How to use BPE along with copying mechanism?


#1

Hi, since BPE splits rare words into subword units or even characters, copying attention can’t be trained sufficiently without seeing target unknown words during training.

I was considering training a joint BPE model on both source and target languages so that they share part of the vocabulary(subwords) which is fine to be copied, however the above-mentioned problem still remains.

Is there any suggestion that I can have both BPE and copying mechanism work properly?