I remember that I use the same data to preprocess without problems, but these times something strange happened.
I have both *.train.pt and *.valid.pt, however, the *vocab.pt is empty. The size of train.pt and valid.pt look normally, but I cannot open them to make sure. I opened the vocab.pt and found there’re only 4 default tokens inside.
This didn’t happen to me before, and I cannot find out the reason. I’m processing Chinese data, and it work normally when the data have been segmented. It didn’t work while I split the data to single chars.
Is there any idea could help me fix it or debug it?