How to know the case of UNK?

(Alexei Rudak) #1

Hello. I have 2 questions

  1. If i try to translate some text with -replace_unk tagged and -tok {src,tgt}_case_feature, so i get result with ⦅unk:xxxxx⦆but in only lowercase. How i can find case type for unk ?

  2. What parameter # mean in tokenization ? In docs it’s described as “then the sequence is considered as unique and is particularily useful to enforce translation of tags using GBS.”

Can somebody clarify what is GBS and give an example of how parameter # work ?