Thanks for the reply.
By mentioning BPE I was going to imply that I removed
<unk> from my corpus. I’m sure that there is no
<unk> in the preporcessed corpus. Also trained three more different models with non-overlapping data to see if I see any change or not, but noting changed. In all models
<unk> is generated in the end of the outputs.
As I use OpenNMT-py for summarization, texts in both source and target sides are English. Surprisingly,
<unk> token is generated only in the end of all output sentences and no where else. Trained models work well if I ignore all
<unk>s. But, as I see this token, I’m not confident about the result. Going through the code to find maybe something there, but I could find anything.
Do you have any idea what the problem is or how I can find it?