Came across this today, anyone who has tried to train massive datasets using SentencePiece knows what a pain it can be. This project appears to have a far more efficient BPE implementation, thought I would share.
Best,
Matt
Came across this today, anyone who has tried to train massive datasets using SentencePiece knows what a pain it can be. This project appears to have a far more efficient BPE implementation, thought I would share.
Best,
Matt