English Chatbot model with OpenNMT

AjayTalati · April 17, 2017, 6:35am

Hi folks I’m new to the pytorch version of opennmt, and want to get started experimenting with training chatbots.

Luckily I’ve got a fair amount of computing resources available, so can can run quite a few experiments simultaneously.

Is there a minimal script(s), that can help me get started - just the simplest/smallest baseline experiment, which I can use as a reference.

Thanks a lot for your help

Kind regards,

Ajay

jean.senellart · April 17, 2017, 9:34am

Hi @AjayTalati, you can train a chatbot exactly as a translation model. So if you just take a subset of the data provided above (for instance 1M sentence), use a moderate model size - for instance 2 layers, rnn_size 500, word_vec_size 500 - you will get a first model pretty quickly with vanilla OpenNMT-py.

xjtu-zeng · May 3, 2017, 6:45am

how do you process this corpus to get the 5M train data?

xjtu-zeng · May 3, 2017, 6:49am

Hi, brother, i use the Opennmt-py model to train the 14M data, and get the following logging:
opensub_acc_29.04_ppl_81.21_e1.pt
opensub_acc_29.50_ppl_75.47_e2.pt
opensub_acc_29.84_ppl_72.77_e3.pt
opensub_acc_29.80_ppl_71.16_e4.pt
opensub_acc_29.92_ppl_70.34_e5.pt
opensub_acc_30.11_ppl_69.15_e6.pt
opensub_acc_30.11_ppl_68.81_e7.pt
opensub_acc_30.09_ppl_68.24_e8.pt
opensub_acc_30.67_ppl_65.31_e9.pt
opensub_acc_30.72_ppl_63.90_e10.pt
opensub_acc_30.87_ppl_63.28_e11.pt

while the ppl trained with 5M data by @DYCSystran is about 30, what’s your training result? Is there anything I missed or just the result?
thx.

DYCSystran · May 3, 2017, 9:23am

I just took the first 5M sentences. Could you please give more information about your training parameters? (optim method, rnn size, number of layers, word embedding size etc)

xjtu-zeng · May 4, 2017, 1:12am

The model comes from the original OpenNMT-py,
optim method: sgd
rnn_size: 500
num_layers: 2
word_embedding_size: 500
batch_size: 64
learning_rate: 1.0
learning_rate_decay: 0.5

do you have any suggestions?

DYCSystran · May 4, 2017, 10:39am

There is not obvious problems with your set of parameters, but I would like to talk about tow points which may explain the difference of our results:

First, we are probably not using the same valid set, so the perplexity is so how not comparable
Secondly, your network may be a little bit small regarding the amount of data you used, I used 22048 for my 5M experiment, and you are just using 2500 for the whole 14M train, Maybe you may need to try with bigger network

Hope that helps

xjtu-zeng · May 29, 2017, 9:21am

Thanks, brother. I want to ask two more question. How about your learning rate, did you refine it along the training or use the initial learning rate, what’s it? And how big is your loss, because my loss seems extremely big.
221256 for the batch_size is 128, Is it a normal size or exists some problem?

xjtu-zeng · June 10, 2017, 3:13am

@jean.senellart hello, I am wondering whether the ppl in the train.py is the perplexity？ it’s the loss between the true target and the generated target, seems not the perplexity metric.

DYCSystran · June 12, 2017, 1:09pm

I used learning rate = 1 then decay of 0.7 after epoch 10;
I used 64 for batch_size, but 128 should work too, 221256 is normal at the beginning of the train but it should be reduced as the train progresses.

xjtu-zeng · June 18, 2017, 7:20am

Hey, dude. I have used the same train data and same parameters, (validation is using the @jean.senellart supported), but the result seems fairly bad compared with your result.
The first epoch only got 68.215 in validData and 98.329 in trainData. I’m confused.

zhongpeixiang · January 25, 2018, 1:08pm

Any updates on this post? I have tried my own implementation of Seq2Seq with attention in PyTorch and the perplexity seems around 140, and the responses are far worse from google’s paper due to many syntactic errors.

Did you use the true target word as the input word to predict next word or use the previously predicted word as the input word?

zhongpeixiang · February 2, 2018, 5:15am

Do these good responses score top among the beams?

LeenaShekhar · February 9, 2018, 5:20am

Is it possible to share the model for this? Thank you.

jean.senellart · February 12, 2018, 8:28pm

Hi Leena, sure. We will push the model on S3.
best
Jean

lucien0410 · October 11, 2018, 6:53pm

Lu@Bot: What is your first job?
* Bot@Lu: I don’t know.
* Lu@Bot: What is your job?
* Bot@Lu: I’m a writer.
* Lu@Bot: How much do you get paid?
* Bot@Lu: Nothing.
* Lu@Bot: How much is 8 +10?
* Bot@Lu: I don’t know.
* Lu@Bot: What languages do you speak?
* Bot@Lu: The English language.
* Lu@Bot: Are you married?
* Bot@Lu: No, I’m not married.
* Lu@Bot: Do you have a wife?
* Bot@Lu: Yes, I do.
* Lu@Bot: Are you male or female?
* Bot@Lu: Female.
* Lu@Bot: Do you have a husband?
* Bot@Lu: Yes.
* Lu@Bot: What is your husband’s name?
* Bot@Lu: I don’t have a husband.
* Lu@Bot: What is your wife’s name?
* Bot@Lu: My wife.

The chatbot is not sincere!