English Chatbot model with OpenNMT


(Ajay Talati) #23

Hi folks :slight_smile: I’m new to the pytorch version of opennmt, and want to get started experimenting with training chatbots.

Luckily I’ve got a fair amount of computing resources available, so can can run quite a few experiments simultaneously.

Is there a minimal script(s), that can help me get started - just the simplest/smallest baseline experiment, which I can use as a reference.

Thanks a lot for your help :slight_smile:

Kind regards,

Ajay


(jean.senellart) #24

Hi @AjayTalati, you can train a chatbot exactly as a translation model. So if you just take a subset of the data provided above (for instance 1M sentence), use a moderate model size - for instance 2 layers, rnn_size 500, word_vec_size 500 - you will get a first model pretty quickly with vanilla OpenNMT-py.


(Zeng) #25

how do you process this corpus to get the 5M train data?


(Zeng) #26

Hi, brother, i use the Opennmt-py model to train the 14M data, and get the following logging:
opensub_acc_29.04_ppl_81.21_e1.pt
opensub_acc_29.50_ppl_75.47_e2.pt
opensub_acc_29.84_ppl_72.77_e3.pt
opensub_acc_29.80_ppl_71.16_e4.pt
opensub_acc_29.92_ppl_70.34_e5.pt
opensub_acc_30.11_ppl_69.15_e6.pt
opensub_acc_30.11_ppl_68.81_e7.pt
opensub_acc_30.09_ppl_68.24_e8.pt
opensub_acc_30.67_ppl_65.31_e9.pt
opensub_acc_30.72_ppl_63.90_e10.pt
opensub_acc_30.87_ppl_63.28_e11.pt

while the ppl trained with 5M data by @DYCSystran is about 30, what’s your training result? Is there anything I missed or just the result?
thx.


(Dyc Systran) #27

I just took the first 5M sentences. Could you please give more information about your training parameters? (optim method, rnn size, number of layers, word embedding size etc)


(Zeng) #28

The model comes from the original OpenNMT-py,
optim method: sgd
rnn_size: 500
num_layers: 2
word_embedding_size: 500
batch_size: 64
learning_rate: 1.0
learning_rate_decay: 0.5

do you have any suggestions?


(Dyc Systran) #29

There is not obvious problems with your set of parameters, but I would like to talk about tow points which may explain the difference of our results:

  • First, we are probably not using the same valid set, so the perplexity is so how not comparable

  • Secondly, your network may be a little bit small regarding the amount of data you used, I used 22048 for my 5M experiment, and you are just using 2500 for the whole 14M train, Maybe you may need to try with bigger network

Hope that helps


(Zeng) #30

Thanks, brother. I want to ask two more question. How about your learning rate, did you refine it along the training or use the initial learning rate, what’s it? And how big is your loss, because my loss seems extremely big.
221256 for the batch_size is 128, Is it a normal size or exists some problem?


(Zeng) #31

@jean.senellart hello, I am wondering whether the ppl in the train.py is the perplexity? it’s the loss between the true target and the generated target, seems not the perplexity metric.


(Dyc Systran) #32

I used learning rate = 1 then decay of 0.7 after epoch 10;
I used 64 for batch_size, but 128 should work too, 221256 is normal at the beginning of the train but it should be reduced as the train progresses.


(Zeng) #33

Hey, dude. I have used the same train data and same parameters, (validation is using the @jean.senellart supported), but the result seems fairly bad compared with your result.
The first epoch only got 68.215 in validData and 98.329 in trainData. :sob: I’m confused.


(Zhong Peixiang) #34

Any updates on this post? I have tried my own implementation of Seq2Seq with attention in PyTorch and the perplexity seems around 140, and the responses are far worse from google’s paper due to many syntactic errors.

Did you use the true target word as the input word to predict next word or use the previously predicted word as the input word?


(Zhong Peixiang) #35

Do these good responses score top among the beams?


(Leena Shekhar) #36

Is it possible to share the model for this? Thank you.


(jean.senellart) #37

Hi Leena, sure. We will push the model on S3.
best
Jean