YouTokenToMe - Up to 90x faster subword encoding

Came across this today, anyone who has tried to train massive datasets using SentencePiece knows what a pain it can be. This project appears to have a far more efficient BPE implementation, thought I would share.

Best,
Matt

2 Likes