Add multi-threading in tokenization


(jean.senellart) #1

would be nice to have multi-threading option in tokenize.lua and detokenize.lua for faster large document tokenization


(jean.senellart) #2

done in tokenization with the option:

  • -nparallel: Number of parallel thread to run the tokenization

with 3 parallel worker on my laptop - speed-up is up to 2


(jean.senellart) #3

same available in detokenization - speed-up small since detokenization is not very CPU intensive