Translation_server is crashing with a bad tokenization entry

When a bad tokenization is sent to the translation_server, it is crashing with this kind of error:
/home/dev8/torch/install/bin/luajit: ./onmt/utils/Features.lua:19: all words must have the same number of features stack traceback: [C]: in function 'assert' ./onmt/utils/Features.lua:19: in function 'extract' tools/translation_server.lua:35: in function 'translateMessage' tools/translation_server.lua:95: in function 'main' tools/translation_server.lua:103: in main chunk [C]: in function 'dofile' ...dev8/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00405d50

It’s a bad behaviour, because it won’t be able to reply at any other requests, even well tokenized. If there is a not well configured client in the set of client machines, its request will be fatal for all others…

The good behaviour would have been to reply with a simple error message.


I agree this is bad. Added an issue.

Proposed fix is just to catch these errors and return an error message to the client.


I agree it is bad too - however, I am curious about the problem @Etienne38. Were you actually using features, or did the problem come from accidental presence of feature separator character in the corpus?

I’m using features. The problem occurred while I was sending a sentence with a token not properly enriched with its feature.

ok - thanks! we need in general to better handle exceptions - all around the code.

1 Like