my Chinese sentence has been segment by spaces.
I tried to use bpe with Chinese, but when I ran learn_bpe.lua, only the English word was be encoded, chinese characters was no change.
thanks!
my Chinese sentence has been segment by spaces.
I tried to use bpe with Chinese, but when I ran learn_bpe.lua, only the English word was be encoded, chinese characters was no change.
thanks!
I have tried google’s sentencepiece, the results are the same as learn_bpe.lua.
hello @netxiao
I am running a first experiment on English to Chinese.
Whay are you trying BPE on already segmented text ?
do you expect some gain using subwords ?
by the way, if you have an existing experiment into Chinese, what kind of PPL do you have at convergence ?
Thanks
my last model ppl is: 5.89, I used bpe for the first time.
in my last test, bpe with chinese worked fine.
Dear @netxiao
I would suggest to use character-based encoding on the Chinese side.
You can have the English BPE-ed, but for Chinese I think it is better to limit the dictionary.
Cheers,
Dimitar
thanks for your answer. you are right, I will test character-based for chinese side.