English Chatbot advice


(higgs) #11

I am now training three different seq2seq implementations over the same dataset: mine, opennmt, and deepqa.
Also I tested practical-seq2seq you mentioned, and neuralconv in torch.
The pretrained practical-seq2seq could reproduce what the blog shows, but neuralconv didn’t give what I expected when training whole cornell dataset.
DeepQA was also written in tensorflow-based seq2seq, and the result was similar to practical-seq2seq.


(srush) #12

Nice, thanks, that sounds like a great experiment. If OpenNMT does not have the lowest perplexity, I would be quite surprised, so please let us know. Quality of generation is another story though, and might require other techniques.


(higgs) #13

For now, I can show you the progress logs as follows:

[02/01/17 17:49:24 INFO] Loading data from ‘cv.movie/movie-train.t7’…
[02/01/17 17:49:30 INFO] * vocabulary size: source = 50004; target = 50004
[02/01/17 17:49:30 INFO] * additional features: source = 0; target = 0
[02/01/17 17:49:30 INFO] * maximum sequence length: source = 50; target = 51
[02/01/17 17:49:30 INFO] * number of training sentences: 178663
[02/01/17 17:49:30 INFO] * maximum batch size: 64
[02/01/17 17:49:30 INFO] Building model…
[02/01/17 17:49:32 INFO] * using input feeding
[02/01/17 17:49:34 INFO] Initializing parameters…
[02/01/17 17:49:36 INFO] * number of parameters: 119480404
[02/01/17 17:49:36 INFO] Preparing memory optimization…
[02/01/17 17:49:36 INFO] * sharing 71% of output/gradInput tensors memory between clones
[02/01/17 17:49:36 INFO] Start training…
[02/01/17 17:49:36 INFO]
[02/01/17 17:50:36 INFO] Epoch 1 ; Iteration 50/2813 ; Learning rate 0.0002 ; Source tokens/s 610 ; Perplexity 4100.64
[02/01/17 17:51:22 INFO] Epoch 1 ; Iteration 100/2813 ; Learning rate 0.0002 ; Source tokens/s 628 ; Perplexity 2082.00
[02/01/17 17:52:07 INFO] Epoch 1 ; Iteration 150/2813 ; Learning rate 0.0002 ; Source tokens/s 604 ; Perplexity 1592.03
[02/01/17 17:52:54 INFO] Epoch 1 ; Iteration 200/2813 ; Learning rate 0.0002 ; Source tokens/s 625 ; Perplexity 1373.68
[02/01/17 17:53:40 INFO] Epoch 1 ; Iteration 250/2813 ; Learning rate 0.0002 ; Source tokens/s 638 ; Perplexity 1248.30
[02/01/17 17:54:27 INFO] Epoch 1 ; Iteration 300/2813 ; Learning rate 0.0002 ; Source tokens/s 674 ; Perplexity 1170.56
[02/01/17 17:55:15 INFO] Epoch 1 ; Iteration 350/2813 ; Learning rate 0.0002 ; Source tokens/s 677 ; Perplexity 1114.62

[02/01/17 18:32:22 INFO] Epoch 1 ; Iteration 2800/2813 ; Learning rate 0.0002 ; Source tokens/s 656 ; Perplexity 496.37
[02/01/17 18:33:59 INFO] Validation perplexity: 262.27
[02/01/17 18:33:59 INFO] Saving checkpoint to ‘cv.movie/movie-model_epoch1_262.27.t7’…

[02/02/17 19:05:54 INFO] Epoch 16 ; Iteration 2750/2813 ; Learning rate 0.0002 ; Source tokens/s 212 ; Perplexity 83.07
[02/02/17 19:08:18 INFO] Epoch 16 ; Iteration 2800/2813 ; Learning rate 0.0002 ; Source tokens/s 212 ; Perplexity 83.20
[02/02/17 19:13:30 INFO] Validation perplexity: 186.03
[02/02/17 19:13:30 INFO] Saving checkpoint to ‘cv.movie/movie-model_epoch16_186.03.t7’…

As you can see, PPL is getting from 1114.62 to 186.03 over 16 epochs.
Can you find out something wrong?
How many epochs do you think I have to train to reproduce, or what PPL would be the target to mean as good convergence?


(Guillaume Klein) #14

Any particular reasons you are not using the default SGD optimizer?


(higgs) #15

@guillaumekln No special reasons. I am not an expert for optimzation, but I just take the below idea.

ADAM computes individual adaptive learning rates for different parameters from estimates of first and second moments of the gradients… and also … The method is designed to combine the advantages of two recently popular methods: Adagrad and RMSProp…
– stolen from quora

Here is another idea about Adagrad.

Adagrad adapts the learning rate to the parameters, performing larger updates for infrequent and smaller updates for frequent parameters. For this reason, it is well-suited for dealing with sparse data. Dean et al. (Google) have found that Adagrad greatly improved the robustness of SGD and used it for training large-scale neural nets at Google
– stolen from an overview of gradient descent optimization algorithms

If you have any idea about which optimizer has been better, I would love to hear from you.


(srush) #16

Can you try just the default options for our sake? (We haven’t found either of these claims to be true for opennmt in practice.)


(higgs) #17

Here is the result on PPL(5.45) after training opennmt for 6 days.

th translate.lua -gpuid 1 -model cv.movie/movie-model_epoch116_2608.57.t7 -src data/movie/eval/eval.src.txt -output data/movie/eval/eval.pred.txt -tgt data/movie/eval/eval.dst.txt
[02/06/17 18:28:53 INFO] Loading ‘cv.movie/movie-model_epoch116_2608.57.t7’…
[02/06/17 18:29:03 INFO] SENT 1: hello!
[02/06/17 18:29:03 INFO] PRED 1: take it easy,
[02/06/17 18:29:03 INFO] PRED SCORE: -4.6531
[02/06/17 18:29:03 INFO] GOLD 1: hello!
[02/06/17 18:29:03 INFO] GOLD SCORE: -23.1047
[02/06/17 18:29:03 INFO]
[02/06/17 18:29:03 INFO] SENT 2: how are you?
[02/06/17 18:29:03 INFO] PRED 2: i don’t know.
[02/06/17 18:29:03 INFO] PRED SCORE: -4.8594
[02/06/17 18:29:03 INFO] GOLD 2: i’m good.
[02/06/17 18:29:03 INFO] GOLD SCORE: -11.5371
[02/06/17 18:29:03 INFO]
[02/06/17 18:29:03 INFO] SENT 3: what’s your name?
[02/06/17 18:29:03 INFO] PRED 3: i don’t know.
[02/06/17 18:29:03 INFO] PRED SCORE: -5.3800
[02/06/17 18:29:03 INFO] GOLD 3: i’m julia.
[02/06/17 18:29:03 INFO] GOLD SCORE: -20.4009
[02/06/17 18:29:03 INFO]
[02/06/17 18:29:03 INFO] SENT 4: when were you born?
[02/06/17 18:29:03 INFO] PRED 4: does it? i don’t know. i don’t fucking know.
[02/06/17 18:29:03 INFO] PRED SCORE: -6.6772
[02/06/17 18:29:03 INFO] GOLD 4: july 20th.
[02/06/17 18:29:03 INFO] GOLD SCORE: -46.2317
[02/06/17 18:29:03 INFO]
[02/06/17 18:29:03 INFO] SENT 5: what year were you born?
[02/06/17 18:29:03 INFO] PRED 5: not really, no.
[02/06/17 18:29:03 INFO] PRED SCORE: -3.3284
[02/06/17 18:29:03 INFO] GOLD 5: 1977.
[02/06/17 18:29:03 INFO] GOLD SCORE: -23.5193
[02/06/17 18:29:03 INFO]
[02/06/17 18:29:03 INFO] SENT 6: where are you from?
[02/06/17 18:29:03 INFO] PRED 6: i don’t know.
[02/06/17 18:29:03 INFO] PRED SCORE: -4.5101
[02/06/17 18:29:03 INFO] GOLD 6: i’m out in the boonies.
[02/06/17 18:29:03 INFO] GOLD SCORE: -20.4121
[02/06/17 18:29:03 INFO]
[02/06/17 18:29:03 INFO] SENT 7: are you a man or a woman?
[02/06/17 18:29:03 INFO] PRED 7: it’ll be all right.
[02/06/17 18:29:03 INFO] PRED SCORE: -4.9822
[02/06/17 18:29:03 INFO] GOLD 7: i’m a woman.
[02/06/17 18:29:03 INFO] GOLD SCORE: -17.0945
[02/06/17 18:29:03 INFO]
[02/06/17 18:29:03 INFO] SENT 8: why are we here?
[02/06/17 18:29:03 INFO] PRED 8: what the hell are you talking about?
[02/06/17 18:29:03 INFO] PRED SCORE: -5.7903
[02/06/17 18:29:03 INFO] GOLD 8: i’m not sure.
[02/06/17 18:29:03 INFO] GOLD SCORE: -8.9543
[02/06/17 18:29:03 INFO]
[02/06/17 18:29:03 INFO] SENT 9: okay, bye!
[02/06/17 18:29:03 INFO] PRED 9: <unk>
[02/06/17 18:29:03 INFO] PRED SCORE: -7.2702
[02/06/17 18:29:03 INFO] GOLD 9: bye.
[02/06/17 18:29:03 INFO] GOLD SCORE: -10.1101
[02/06/17 18:29:03 INFO]
[02/06/17 18:29:03 INFO] SENT 10: see you later.
[02/06/17 18:29:03 INFO] PRED 10: are you sure?
[02/06/17 18:29:03 INFO] PRED SCORE: -5.4196
[02/06/17 18:29:03 INFO] GOLD 10: bye.
[02/06/17 18:29:03 INFO] GOLD SCORE: -9.0895
[02/06/17 18:29:03 INFO]
[02/06/17 18:29:03 INFO] SENT 11: my name is david. what is my name?
[02/06/17 18:29:03 INFO] PRED 11: so why do you think she ran?
[02/06/17 18:29:03 INFO] PRED SCORE: -7.4106
[02/06/17 18:29:03 INFO] GOLD 11: david.
[02/06/17 18:29:03 INFO] GOLD SCORE: -32.7961
[02/06/17 18:29:03 INFO]
[02/06/17 18:29:03 INFO] SENT 12: my name is john. what is my name?
[02/06/17 18:29:03 INFO] PRED 12: of course.
[02/06/17 18:29:03 INFO] PRED SCORE: -3.2876
[02/06/17 18:29:03 INFO] GOLD 12: john.
[02/06/17 18:29:03 INFO] GOLD SCORE: -19.0242
[02/06/17 18:29:03 INFO]
[02/06/17 18:29:03 INFO] SENT 13: are you a leader or a follower?
[02/06/17 18:29:03 INFO] PRED 13: <unk>
[02/06/17 18:29:03 INFO] PRED SCORE: -2.1966
[02/06/17 18:29:03 INFO] GOLD 13: i’m a leader.
[02/06/17 18:29:03 INFO] GOLD SCORE: -36.1867
[02/06/17 18:29:03 INFO]
[02/06/17 18:29:03 INFO] SENT 14: are you a follower or a leader?
[02/06/17 18:29:03 INFO] PRED 14: you won’t want to fuck it.
[02/06/17 18:29:03 INFO] PRED SCORE: -6.1806
[02/06/17 18:29:03 INFO] GOLD 14: i’m a leader.
[02/06/17 18:29:03 INFO] GOLD SCORE: -47.1615
[02/06/17 18:29:03 INFO]
[02/06/17 18:29:03 INFO] SENT 15: who is skywalker?
[02/06/17 18:29:03 INFO] PRED 15: i don’t know.
[02/06/17 18:29:03 INFO] PRED SCORE: -5.7206
[02/06/17 18:29:03 INFO] GOLD 15: he is a here.
[02/06/17 18:29:03 INFO] GOLD SCORE: -27.4052
[02/06/17 18:29:03 INFO]
[02/06/17 18:29:03 INFO] SENT 16: who is bill clinton?
[02/06/17 18:29:03 INFO] PRED 16: you mean <unk>
[02/06/17 18:29:03 INFO] PRED SCORE: -4.0159
[02/06/17 18:29:03 INFO] GOLD 16: he’s a billionaire.
[02/06/17 18:29:03 INFO] GOLD SCORE: -41.5887
[02/06/17 18:29:03 INFO]
[02/06/17 18:29:03 INFO] SENT 17: is sky blue or black?
[02/06/17 18:29:03 INFO] PRED 17: she was only-
[02/06/17 18:29:03 INFO] PRED SCORE: -3.7924
[02/06/17 18:29:03 INFO] GOLD 17: blue.
[02/06/17 18:29:03 INFO] GOLD SCORE: -51.0851
[02/06/17 18:29:03 INFO]
[02/06/17 18:29:03 INFO] SENT 18: does a cat have a tail?
[02/06/17 18:29:03 INFO] PRED 18: i said you <unk>
[02/06/17 18:29:03 INFO] PRED SCORE: -4.6309
[02/06/17 18:29:03 INFO] GOLD 18: yes.
[02/06/17 18:29:03 INFO] GOLD SCORE: -11.8456
[02/06/17 18:29:03 INFO]
[02/06/17 18:29:03 INFO] SENT 19: does a cat have a wing?
[02/06/17 18:29:03 INFO] PRED 19: what’s that?
[02/06/17 18:29:03 INFO] PRED SCORE: -2.9575
[02/06/17 18:29:03 INFO] GOLD 19: no
[02/06/17 18:29:03 INFO] GOLD SCORE: -26.6376
[02/06/17 18:29:03 INFO]
[02/06/17 18:29:03 INFO] SENT 20: can a cat fly?
[02/06/17 18:29:03 INFO] PRED 20: is that a fact?
[02/06/17 18:29:03 INFO] PRED SCORE: -2.7142
[02/06/17 18:29:03 INFO] GOLD 20: no
[02/06/17 18:29:03 INFO] GOLD SCORE: -33.1640
[02/06/17 18:29:03 INFO]
[02/06/17 18:29:03 INFO] SENT 21: how many legs does a cat have?
[02/06/17 18:29:03 INFO] PRED 21: if anything else –
[02/06/17 18:29:03 INFO] PRED SCORE: -4.9662
[02/06/17 18:29:03 INFO] GOLD 21: four, i think.
[02/06/17 18:29:03 INFO] GOLD SCORE: -38.7748
[02/06/17 18:29:03 INFO]
[02/06/17 18:29:03 INFO] SENT 22: how many legs does a spider have?
[02/06/17 18:29:03 INFO] PRED 22: why?
[02/06/17 18:29:03 INFO] PRED SCORE: -1.1344
[02/06/17 18:29:03 INFO] GOLD 22: three, i think.
[02/06/17 18:29:03 INFO] GOLD SCORE: -40.6665
[02/06/17 18:29:03 INFO]
[02/06/17 18:29:03 INFO] SENT 23: how many legs does a centipede have?
[02/06/17 18:29:03 INFO] PRED 23:
[02/06/17 18:29:03 INFO] PRED SCORE: -0.2467
[02/06/17 18:29:03 INFO] GOLD 23: eight.
[02/06/17 18:29:03 INFO] GOLD SCORE: -23.1143
[02/06/17 18:29:03 INFO]
[02/06/17 18:29:03 INFO] SENT 24: what is the color of the sky?
[02/06/17 18:29:03 INFO] PRED 24: is he coming over there?
[02/06/17 18:29:03 INFO] PRED SCORE: -3.7642
[02/06/17 18:29:03 INFO] GOLD 24: blue.
[02/06/17 18:29:03 INFO] GOLD SCORE: -41.3151
[02/06/17 18:29:03 INFO]
[02/06/17 18:29:03 INFO] SENT 25: what is the color of water?
[02/06/17 18:29:03 INFO] PRED 25: i’m going over to my cabin.
[02/06/17 18:29:03 INFO] PRED SCORE: -3.6146
[02/06/17 18:29:03 INFO] GOLD 25: water.
[02/06/17 18:29:03 INFO] GOLD SCORE: -47.9572
[02/06/17 18:29:03 INFO]
[02/06/17 18:29:03 INFO] SENT 26: what is the color of blood?
[02/06/17 18:29:03 INFO] PRED 26: is there a priest?
[02/06/17 18:29:03 INFO] PRED SCORE: -4.0399
[02/06/17 18:29:03 INFO] GOLD 26: it is the same as a black eye.
[02/06/17 18:29:03 INFO] GOLD SCORE: -59.0058
[02/06/17 18:29:03 INFO]
[02/06/17 18:29:03 INFO] SENT 27: what is the usual color of a leaf?
[02/06/17 18:29:03 INFO] PRED 27: he’s <unk>
[02/06/17 18:29:03 INFO] PRED SCORE: -1.6805
[02/06/17 18:29:03 INFO] GOLD 27: it is a green one.
[02/06/17 18:29:03 INFO] GOLD SCORE: -40.8062
[02/06/17 18:29:03 INFO]
[02/06/17 18:29:03 INFO] SENT 28: what is the color of a yellow car?
[02/06/17 18:29:03 INFO] PRED 28: i paid three hundred and fifty thousand dollars.
[02/06/17 18:29:03 INFO] PRED SCORE: -5.8376
[02/06/17 18:29:03 INFO] GOLD 28: yellow.
[02/06/17 18:29:03 INFO] GOLD SCORE: -43.5892
[02/06/17 18:29:03 INFO]
[02/06/17 18:29:03 INFO] SENT 29: how much is two plus two?
[02/06/17 18:29:03 INFO] PRED 29: what?
[02/06/17 18:29:03 INFO] PRED SCORE: -0.9964
[02/06/17 18:29:03 INFO] GOLD 29: four.
[02/06/17 18:29:03 INFO] GOLD SCORE: -14.9198
[02/06/17 18:29:03 INFO]
[02/06/17 18:29:03 INFO] SENT 30: how much is ten minus two?
[02/06/17 18:29:03 INFO] PRED 30: there’s not much to tell.
[02/06/17 18:29:03 INFO] PRED SCORE: -4.0325
[02/06/17 18:29:03 INFO] GOLD 30: seventy - two.
[02/06/17 18:29:03 INFO] GOLD SCORE: -51.2045
[02/06/17 18:29:03 INFO]
[02/06/17 18:29:08 INFO] SENT 31: what is the purpose of life?
[02/06/17 18:29:08 INFO] PRED 31: i don’t know what you’re talking about.
[02/06/17 18:29:08 INFO] PRED SCORE: -3.1786
[02/06/17 18:29:08 INFO] GOLD 31: to serve the greater good.
[02/06/17 18:29:08 INFO] GOLD SCORE: -63.6982
[02/06/17 18:29:08 INFO]
[02/06/17 18:29:08 INFO] SENT 32: what is the purpose of living?
[02/06/17 18:29:08 INFO] PRED 32: what are you talking about?
[02/06/17 18:29:08 INFO] PRED SCORE: -1.1706
[02/06/17 18:29:08 INFO] GOLD 32: to live forever.
[02/06/17 18:29:08 INFO] GOLD SCORE: -34.3867
[02/06/17 18:29:08 INFO]
[02/06/17 18:29:08 INFO] SENT 33: what is the purpose of existence?
[02/06/17 18:29:08 INFO] PRED 33: i would like to see that.
[02/06/17 18:29:08 INFO] PRED SCORE: -5.2952
[02/06/17 18:29:08 INFO] GOLD 33: to find out what happens when we get to the planet earth.
[02/06/17 18:29:08 INFO] GOLD SCORE: -81.7606
[02/06/17 18:29:08 INFO]
[02/06/17 18:29:08 INFO] SENT 34: where are you now?
[02/06/17 18:29:08 INFO] PRED 34:
[02/06/17 18:29:08 INFO] PRED SCORE: -3.2828
[02/06/17 18:29:08 INFO] GOLD 34: i’m in the middle of nowhwere.
[02/06/17 18:29:08 INFO] GOLD SCORE: -31.3634
[02/06/17 18:29:08 INFO]
[02/06/17 18:29:08 INFO] SENT 35: what is the purpose of dying?
[02/06/17 18:29:08 INFO] PRED 35: you’re that?
[02/06/17 18:29:08 INFO] PRED SCORE: -2.4893
[02/06/17 18:29:08 INFO] GOLD 35: to have a life.
[02/06/17 18:29:08 INFO] GOLD SCORE: -32.7339
[02/06/17 18:29:08 INFO]
[02/06/17 18:29:08 INFO] SENT 36: what is the purpose of being intelligent?
[02/06/17 18:29:08 INFO] PRED 36: dream of them?
[02/06/17 18:29:08 INFO] PRED SCORE: -4.7567
[02/06/17 18:29:08 INFO] GOLD 36: to find out what it is.
[02/06/17 18:29:08 INFO] GOLD SCORE: -32.0949
[02/06/17 18:29:08 INFO]
[02/06/17 18:29:08 INFO] SENT 37: what is the purpose of emotions?
[02/06/17 18:29:08 INFO] PRED 37: oh
[02/06/17 18:29:08 INFO] PRED SCORE: -2.3428
[02/06/17 18:29:08 INFO] GOLD 37: i don’t know.
[02/06/17 18:29:08 INFO] GOLD SCORE: -5.8058
[02/06/17 18:29:08 INFO]
[02/06/17 18:29:08 INFO] SENT 38: what is moral?
[02/06/17 18:29:08 INFO] PRED 38: what do you mean?
[02/06/17 18:29:08 INFO] PRED SCORE: -5.7657
[02/06/17 18:29:08 INFO] GOLD 38: what empowered humanity, what intellectual the essence is.
[02/06/17 18:29:08 INFO] GOLD SCORE: -120.9605
[02/06/17 18:29:08 INFO]
[02/06/17 18:29:08 INFO] SENT 39: waht is immoral?
[02/06/17 18:29:08 INFO] PRED 39: what do you mean?
[02/06/17 18:29:08 INFO] PRED SCORE: -5.8085
[02/06/17 18:29:08 INFO] GOLD 39: the fact that you have a child.
[02/06/17 18:29:08 INFO] GOLD SCORE: -42.7499
[02/06/17 18:29:08 INFO]
[02/06/17 18:29:08 INFO] SENT 40: what is morality?
[02/06/17 18:29:08 INFO] PRED 40: what do you mean?
[02/06/17 18:29:08 INFO] PRED SCORE: -5.7657
[02/06/17 18:29:08 INFO] GOLD 40: what is altruism?
[02/06/17 18:29:08 INFO] GOLD SCORE: -15.9976
[02/06/17 18:29:08 INFO]
[02/06/17 18:29:08 INFO] SENT 41: what is the definition of altruism?
[02/06/17 18:29:08 INFO] PRED 41: you could have had a lot of fun to take care of your heart?
[02/06/17 18:29:08 INFO] PRED SCORE: -10.7065
[02/06/17 18:29:08 INFO] GOLD 41: if you don’t believe in god, then you don’t know.
[02/06/17 18:29:08 INFO] GOLD SCORE: -84.3690
[02/06/17 18:29:08 INFO]
[02/06/17 18:29:08 INFO] SENT 42: ok… so what is the definition of morality?
[02/06/17 18:29:08 INFO] PRED 42: you loved <unk> for three years before he was a young lady, and that one of us had been put into the wrong place for two hundred years ago. for the rest of your life.
[02/06/17 18:29:08 INFO] PRED SCORE: -33.8666
[02/06/17 18:29:08 INFO] GOLD 42: well, the truth is, you’re not a believer in god almighty.
[02/06/17 18:29:08 INFO] GOLD SCORE: -116.7992
[02/06/17 18:29:08 INFO]
[02/06/17 18:29:08 INFO] SENT 43: tell me the definition of morality, i am quite upset now!
[02/06/17 18:29:08 INFO] PRED 43: it’s just my fault. i don’t think i know what you’re going to do.
[02/06/17 18:29:08 INFO] PRED SCORE: -8.5411
[02/06/17 18:29:08 INFO] GOLD 43: i’m not ashamed of being a philosopher!
[02/06/17 18:29:08 INFO] GOLD SCORE: -36.8441
[02/06/17 18:29:08 INFO]
[02/06/17 18:29:08 INFO] SENT 44: tell me the definition of morality.
[02/06/17 18:29:08 INFO] PRED 44: you were very happy with me to go to the party.
[02/06/17 18:29:08 INFO] PRED SCORE: -12.0489
[02/06/17 18:29:08 INFO] GOLD 44: i don’t have ethics.
[02/06/17 18:29:08 INFO] GOLD SCORE: -58.9555
[02/06/17 18:29:08 INFO]
[02/06/17 18:29:08 INFO] SENT 45: look, i need help, i need to know more about morality…
[02/06/17 18:29:08 INFO] PRED 45: what do you mean?
[02/06/17 18:29:08 INFO] PRED SCORE: -4.7268
[02/06/17 18:29:08 INFO] GOLD 45: i don’t know what ethics is.
[02/06/17 18:29:08 INFO] GOLD SCORE: -33.4809
[02/06/17 18:29:08 INFO]
[02/06/17 18:29:08 INFO] SENT 46: seriously, what is morality?
[02/06/17 18:29:08 INFO] PRED 46: i’m
[02/06/17 18:29:08 INFO] PRED SCORE: -3.1093
[02/06/17 18:29:08 INFO] GOLD 46: what is the definition of living?
[02/06/17 18:29:08 INFO] GOLD SCORE: -49.1426
[02/06/17 18:29:08 INFO]
[02/06/17 18:29:08 INFO] SENT 47: why living has anything to do with morality?
[02/06/17 18:29:08 INFO] PRED 47: i’d like to ask you about the claymore.
[02/06/17 18:29:08 INFO] PRED SCORE: -6.8416
[02/06/17 18:29:08 INFO] GOLD 47: you’re not a cop.
[02/06/17 18:29:08 INFO] GOLD SCORE: -15.9849
[02/06/17 18:29:08 INFO]
[02/06/17 18:29:08 INFO] SENT 48: okay, i need to know how should i behave morally…
[02/06/17 18:29:08 INFO] PRED 48: i don’t want to hear any tales about colored people…
[02/06/17 18:29:08 INFO] PRED SCORE: -4.3196
[02/06/17 18:29:08 INFO] GOLD 48: i don’t know how to tell you.
[02/06/17 18:29:08 INFO] GOLD SCORE: -23.7476
[02/06/17 18:29:08 INFO]
[02/06/17 18:29:08 INFO] SENT 49: is morality and ethics the same?
[02/06/17 18:29:08 INFO] PRED 49: how did you know that?
[02/06/17 18:29:08 INFO] PRED SCORE: -5.2880
[02/06/17 18:29:08 INFO] GOLD 49: yes, sir.
[02/06/17 18:29:08 INFO] GOLD SCORE: -13.7771
[02/06/17 18:29:08 INFO]
[02/06/17 18:29:08 INFO] SENT 50: what are the things that i do to be immoral?
[02/06/17 18:29:08 INFO] PRED 50: i don’t know.
[02/06/17 18:29:08 INFO] PRED SCORE: -5.1100
[02/06/17 18:29:08 INFO] GOLD 50: i don’t know.
[02/06/17 18:29:08 INFO] GOLD SCORE: -5.1096
[02/06/17 18:29:08 INFO]
[02/06/17 18:29:08 INFO] SENT 51: give me some examples of moral actions…
[02/06/17 18:29:08 INFO] PRED 51: if i get
[02/06/17 18:29:08 INFO] PRED SCORE: -5.2615
[02/06/17 18:29:08 INFO] GOLD 51: i’m not a moralist.
[02/06/17 18:29:08 INFO] GOLD SCORE: -23.8791
[02/06/17 18:29:08 INFO]
[02/06/17 18:29:08 INFO] SENT 52: alright, morality?
[02/06/17 18:29:08 INFO] PRED 52: it’s a
[02/06/17 18:29:08 INFO] PRED SCORE: -4.5692
[02/06/17 18:29:08 INFO] GOLD 52: integrity.
[02/06/17 18:29:08 INFO] GOLD SCORE: -23.4797
[02/06/17 18:29:08 INFO]
[02/06/17 18:29:08 INFO] SENT 53: what is integrity?
[02/06/17 18:29:08 INFO] PRED 53: come on.
[02/06/17 18:29:08 INFO] PRED SCORE: -1.1635
[02/06/17 18:29:08 INFO] GOLD 53: i’m sorry, i don’t know what else to say.
[02/06/17 18:29:08 INFO] GOLD SCORE: -60.7661
[02/06/17 18:29:08 INFO]
[02/06/17 18:29:08 INFO] SENT 54: be moral!
[02/06/17 18:29:08 INFO] PRED 54: <unk>
[02/06/17 18:29:08 INFO] PRED SCORE: -7.9292
[02/06/17 18:29:08 INFO] GOLD 54: be a man!
[02/06/17 18:29:08 INFO] GOLD SCORE: -18.5136
[02/06/17 18:29:08 INFO]
[02/06/17 18:29:08 INFO] SENT 55: i really like our discussion on morality and ethics…
[02/06/17 18:29:08 INFO] PRED 55: i’m going to give you the fucking money.
[02/06/17 18:29:08 INFO] PRED SCORE: -4.7884
[02/06/17 18:29:08 INFO] GOLD 55: and how i’m not in the mood for a philosophical debate.
[02/06/17 18:29:08 INFO] GOLD SCORE: -100.2294
[02/06/17 18:29:08 INFO]
[02/06/17 18:29:08 INFO] SENT 56: what do you like to talk about?
[02/06/17 18:29:08 INFO] PRED 56:
[02/06/17 18:29:08 INFO] PRED SCORE: -2.6590
[02/06/17 18:29:08 INFO] GOLD 56: nothing.
[02/06/17 18:29:08 INFO] GOLD SCORE: -9.4885
[02/06/17 18:29:08 INFO]
[02/06/17 18:29:08 INFO] SENT 57: what do you think about tesla?
[02/06/17 18:29:08 INFO] PRED 57:
[02/06/17 18:29:08 INFO] PRED SCORE: -3.4895
[02/06/17 18:29:08 INFO] GOLD 57: he’s a good conductor.
[02/06/17 18:29:08 INFO] GOLD SCORE: -30.3708
[02/06/17 18:29:08 INFO]
[02/06/17 18:29:08 INFO] SENT 58: what do you think about bill gates?
[02/06/17 18:29:08 INFO] PRED 58: good night.
[02/06/17 18:29:08 INFO] PRED SCORE: -2.3046
[02/06/17 18:29:08 INFO] GOLD 58: he’s a good man.
[02/06/17 18:29:08 INFO] GOLD SCORE: -16.6840
[02/06/17 18:29:08 INFO]
[02/06/17 18:29:08 INFO] SENT 59: what do you think about messi?
[02/06/17 18:29:08 INFO] PRED 59: <unk>
[02/06/17 18:29:08 INFO] PRED SCORE: -3.4895
[02/06/17 18:29:08 INFO] GOLD 59: he’s a great player.
[02/06/17 18:29:08 INFO] GOLD SCORE: -21.8029
[02/06/17 18:29:08 INFO]
[02/06/17 18:29:08 INFO] SENT 60: what do you think about cleopatra?
[02/06/17 18:29:08 INFO] PRED 60: <unk>
[02/06/17 18:29:08 INFO] PRED SCORE: -3.4895
[02/06/17 18:29:08 INFO] GOLD 60: oh, she’s very regal.
[02/06/17 18:29:08 INFO] GOLD SCORE: -18.5719
[02/06/17 18:29:08 INFO]
[02/06/17 18:29:08 INFO] SENT 61: what do you think about england during the reign of elizabeth?
[02/06/17 18:29:08 INFO] PRED 61: you make a mistake. in the house.
[02/06/17 18:29:08 INFO] PRED SCORE: -4.7818
[02/06/17 18:29:08 INFO] GOLD 61: it was a great place.
[02/06/17 18:29:08 INFO] GOLD SCORE: -40.2344
[02/06/17 18:29:08 INFO]
[02/06/17 18:29:08 INFO] PRED AVG SCORE: -1.0644, PRED PPL: 2.8992
[02/06/17 18:29:08 INFO] GOLD AVG SCORE: -9.3269, GOLD PPL: 11235.7718


(srush) #18

Just to confirm, before ppl was 160, and now it is 5.45?

Clearly the answers are not great, but they seem to be more diverse now and some are reasonable (questions with questions). I don’t really know what good would be yet on this dataset, as evaluating chatbots is still tough.

Would you by any chance want to post your model? We would give you credit and a link.


(higgs) #19

On epoch 116,

  • PPL(training data): 5.45
  • PPL(validation data): 2608 (surprisingly at epoch 16, it was 186.03)

(jean.senellart) #20

Something is strange, you have a lot of empty answers, and it does not seem that any of the answer is really adapted to the question - how big is the training corpus?
what is the size of the network? Could you give the output for your epoch 16? the fact that your PPL is increasing on validation data means that you are overfitting: the network has started to memorize answers


(higgs) #21

The dataset is cornell movie corpus(222,616 pairs) as I mentioned before.
I separated it into train(183,941), valid(22,162), and test(15,513).
I used 3 layers with 1000 rnn size, and 300 word embedding size.
You can see the whole parameters as below.

th preprocess.lua -train_src data/movie/train.src.txt -train_tgt data/movie/train.dst.txt -valid_src data/movie/valid.src.txt -valid_tgt data/movie/valid.dst.txt -save_data results/movie

th train.lua -gpuid 1 -data results/movie-train.t7 -save_model results/movie-model -layers 3 -rnn_size 1000 -word_vec_size 300 -brnn_merge ‘concat’

And the initial result from epoch 16 is not good either.

[02/07/17 17:44:50 INFO] Loading ‘cv.movie/movie-model_epoch16_186.03.t7’…
[02/07/17 17:45:18 INFO] SENT 1: hello!
[02/07/17 17:45:18 INFO] PRED 1: you know what i mean.
[02/07/17 17:45:18 INFO] PRED SCORE: -8.7295
[02/07/17 17:45:18 INFO] GOLD 1: hello!
[02/07/17 17:45:18 INFO] GOLD SCORE: -14.9353
[02/07/17 17:45:18 INFO]
[02/07/17 17:45:18 INFO] SENT 2: how are you?
[02/07/17 17:45:18 INFO] PRED 2: i don’t know.
[02/07/17 17:45:18 INFO] PRED SCORE: -6.0676
[02/07/17 17:45:18 INFO] GOLD 2: i’m good.
[02/07/17 17:45:18 INFO] GOLD SCORE: -12.0968
[02/07/17 17:45:18 INFO]
[02/07/17 17:45:18 INFO] SENT 3: what’s your name?
[02/07/17 17:45:18 INFO] PRED 3: i don’t know.
[02/07/17 17:45:18 INFO] PRED SCORE: -6.0925
[02/07/17 17:45:18 INFO] GOLD 3: i’m julia.
[02/07/17 17:45:18 INFO] GOLD SCORE: -20.1079
[02/07/17 17:45:18 INFO]
[02/07/17 17:45:18 INFO] SENT 4: when were you born?
[02/07/17 17:45:18 INFO] PRED 4: don’t worry about it. i don’t know what to say.
[02/07/17 17:45:18 INFO] PRED SCORE: -16.6385
[02/07/17 17:45:18 INFO] GOLD 4: july 20th.
[02/07/17 17:45:18 INFO] GOLD SCORE: -19.1772
[02/07/17 17:45:18 INFO]
[02/07/17 17:45:18 INFO] SENT 5: what year were you born?
[02/07/17 17:45:18 INFO] PRED 5: not yet.
[02/07/17 17:45:18 INFO] PRED SCORE: -5.9572
[02/07/17 17:45:18 INFO] GOLD 5: 1977.
[02/07/17 17:45:18 INFO] GOLD SCORE: -6.7442
[02/07/17 17:45:18 INFO]
[02/07/17 17:45:18 INFO] SENT 6: where are you from?
[02/07/17 17:45:18 INFO] PRED 6: i don’t know.
[02/07/17 17:45:18 INFO] PRED SCORE: -5.9520
[02/07/17 17:45:18 INFO] GOLD 6: i’m out in the boonies.
[02/07/17 17:45:18 INFO] GOLD SCORE: -18.8673
[02/07/17 17:45:18 INFO]
[02/07/17 17:45:18 INFO] SENT 7: are you a man or a woman?
[02/07/17 17:45:18 INFO] PRED 7: <unk>
[02/07/17 17:45:18 INFO] PRED SCORE: -5.2376
[02/07/17 17:45:18 INFO] GOLD 7: i’m a woman.
[02/07/17 17:45:18 INFO] GOLD SCORE: -15.3107

(omitted)

From this result, I don’t think that PPL guarantees how good the conversational model is.
I wonder if anyone successfully could reproduce google’s result.
Isn’t there any idea?


(jean.senellart) #22

Thanks. I will give a try too - the fact that there is no answer that have anything to do with source is really weird - it seems that the only ability of the system is to generate reasonable sentence which shows on the PPL.


(jean.senellart) #23

google experiment is using whole opensubtitle has training data which is 62M sentences sentence pair - I will also start on a bigger dataset


(higgs) #24

@jean.senellart
If you can see there is no answer, it is because of “unk” token with brackets.
In this post, bracket is used for blockquote mark.
So don’t put any meaning on no answer in response.

The real problem is unreasonable responses.
If this problem is solved through bigger dataset, it will be awesome.
And PPL is just like measurement score for language model, so it doesn’t give any score how reasonable the response is given a request.
BLEU can be used for machine translation, but I’m not sure.


(jean.senellart) #25

Hi @higgs. (I fixed the <unk> display in your logs)

I built a OpenSubtitle corpus and opened a new tutorial topic for follow-up. We have a huge corpus to experiment :slight_smile: - please contribute !

I will kick off a training on my side with following set-up close to the original set-up:

  • remove all sentences with <unk> in target
  • 2 layers, 4096 LSTM, no attention model.

(higgs) #26

@jean.senellart

Using lager dialog corpus would be a good option since google paper mentioned they used 62M sentences.
Due to your dataset(14M), I am now training again.
But I could not use 2 layers with 4096 LSTM because of “out-of-memory” in GTX Titan(12G mem).
Let’s see the result after several days.


(jean.senellart) #27

on my side, 4096 passes by reducing maximum sentence length (20 in source, 30 in target) which drops less than 2% of the sentences.


(Li hangyu) #28

Are there any online demos for any good-working models? i’d like to try some queries. And i think we should release some good queries for testing ,so one can make a comparison between results of his model and others’


(jean.senellart) #29

you can try the model described here at this url: http://chatbot.mysystran.com.


(Zhong Peixiang) #30

Could you make this tutorial open source into the opennmt? The responses are quite nice i think, at least in terms of grammar.