Can you enlighten the tokenization process of multiple (weighted) datasets? Use one model for training and evaluation data tokenization or per each dataset apart?
Usually you apply the same tokenization for all datasets.
1 Like