I have 1,200,000+ sentence pairs with a big dictionary which contains more than 1,300,000 phase pairs. All my data have been checked for several times and they are very clean. If I want to get a good model for industry use, which options should I use? I’m training a model with 6 layers and 1000 rnn units per layer now, and I’m not sure whether I should use a bigger dropout rate or vocabulary size or not. I have much memory on GPU I could use, because of OpenNMT’s efficient memory usage of course.
Thank you if you could give me some advise.
Hello - 4 layers, bidirectional rnn, 800 size for word embedding should be good. You might consider adding more open source data even if of a lesser quality to increase a bit more the size of your training set.