ZH->EN uppercased the second word of the sentence?

I’m experimenting ZH->EN translation. My first model seems to put the uppercased letter on the second word of the sentence, rather than the first.

Of course I’m using the no-feature-shifting patch. But, it is installed on both the training and the translating code. I just double-checked it. I didn’t experiment this uppercase problem with other languages.

Is this a common known behaviour of this ZH->EN translation with ONMT ? Or, do I need to investigate more about the no-feature-shifting patch ?

This is most likely due to the patch. Could you share the output of git diff restricted to the patch?

I didn’t make a recent update of the code. It was working properly with other languages.

Here are the 3 modifications, I checked:

What appears to be the first generated case?

All sentences nearly start like this:

The raw output with the feature annotations would be more helpful.

Found !

It was a combination of two bugs in my own code, doing a bad job on ZH specific entries when dealing with upper/lowercased chars (of course, not well handled for the specific ZH language).

Thanks for your time !

1 Like