Hello everyone. I met a little question when I was training a zh-en translation model.
I use a dataset which include 1 million lines corpus, 10 thousand lines of validation dataset, and 10 thousand of testing dataset. All of zh parallel corpus are tokenized by Jieba and Moses. All of English corpus are tokenized by Moses.
I preprocess these dataset with src(tgt)_seq_length 80. Traning with all default parameters.
it is normal when the model is traning. however, it always shows that “CUDA: out of memory” when validating process. Can any friend give me some advices how to address this problem? Thanks so much!