In Tutorial, UnicodeEncodeError

WoongPro · November 2, 2020, 12:37pm

Hi, Thanks to click.

I did OpenNMT-py in git-hub Tutorial, but I got a problem. In 1 Step, I commanded onmt_build_vocab -config toy_en_de.yaml -n_sample 10000 but It got some thing wrong in tgt. Error is
UnicodeEncodeError: ‘cp949’ codec can’t encode character ‘\ufffd’ in position 0: illegal multibyte sequence
How to fix up that?

francoishernandez · November 2, 2020, 1:12pm

I think there is an encoding mismatch between your data (probably utf-8) and the default encoding of your system (cp949 it seems).
There probably are some implicit operations that should be done explicitly in utf-8.
Can you post the whole trace to properly identify where this is triggered?
Thanks.

WoongPro · November 3, 2020, 2:23pm

Thanks for your answer. I solved that error refered to your answer. And passed step1.
But I got some another problem in step 2.
The problem is
AttributeError: module ‘torch._C’ has no attribute ‘_cuda_setDevice’
I think this got problem in GPU.
How to solve this problem?

francoishernandez · November 3, 2020, 4:31pm

I don’t know much about using pytorch with cuda on windows. You can probably start by trying this a python shell:

import torch
torch.cuda.is_available()

and see what you get.