Weighted datasets and tokenization

Lalibuck · September 12, 2023, 2:40pm

Can you enlighten the tokenization process of multiple (weighted) datasets? Use one model for training and evaluation data tokenization or per each dataset apart?

guillaumekln · September 19, 2023, 8:08am

Usually you apply the same tokenization for all datasets.