Hi everyone . I am trying to do something with chinese to english translation . I preprocessed my chinese data with a token instead of spaces while doing character tokenizing . But while running process.py , I am encountering with an error “Dataset cannot contain special characters”.
If you can share files and steps to reproduce the exact issue, it would help.