Hi, since BPE splits rare words into subword units or even characters, copying attention can’t be trained sufficiently without seeing target unknown words during training.
I was considering training a joint BPE model on both source and target languages so that they share part of the vocabulary(subwords) which is fine to be copied, however the above-mentioned problem still remains.
Is there any suggestion that I can have both BPE and copying mechanism work properly?