How to use BPE along with copying mechanism?

Hi, since BPE splits rare words into subword units or even characters, copying attention can’t be trained sufficiently without seeing target unknown words during training.

I was considering training a joint BPE model on both source and target languages so that they share part of the vocabulary(subwords) which is fine to be copied, however the above-mentioned problem still remains.

Is there any suggestion that I can have both BPE and copying mechanism work properly?