OpenNMT Forum

UnicodeEncodeError: 'charmap' codec can't encode character '\ufffd' in position 0: character maps to <undefined>

Here is my .yml file :


Where the samples will be written

save_data: run/example

Where the vocab(s) will be written

src_vocab: run/example.vocab.src
tgt_vocab: run/example.vocab.tgt

Prevent overwriting existing files in the folder

overwrite: False

Corpus opts:

path_src: src-train.txt
path_tgt: tgt-train.txt
path_src: src-val.txt
path_tgt: tgt-val.txt

my directory is : C:\Users\Dhar_7\OpenNMT-py\toy-ende>

i ran the following code : onmt_build_vocab -config toy_en_de.yml -n_sample 10000

got the following error : UnicodeEncodeError: ‘charmap’ codec can’t encode character ‘\ufffd’ in position 0: character maps to

There probably is an issue with the way you created your .yaml configuration. (encoding issue)
Can you try converting / saving it directly in ‘utf-8’ encoding?

i cant understand what you mean . will you please kindly explan me the process of creating .yaml / utf-8 encoding??

Some useful links: