The dataset is cornell movie corpus(222,616 pairs) as I mentioned before.
I separated it into train(183,941), valid(22,162), and test(15,513).
I used 3 layers with 1000 rnn size, and 300 word embedding size.
You can see the whole parameters as below.
And the initial result from epoch 16 is not good either.
[02/07/17 17:44:50 INFO] Loading ācv.movie/movie-model_epoch16_186.03.t7āā¦
[02/07/17 17:45:18 INFO] SENT 1: hello!
[02/07/17 17:45:18 INFO] PRED 1: you know what i mean.
[02/07/17 17:45:18 INFO] PRED SCORE: -8.7295
[02/07/17 17:45:18 INFO] GOLD 1: hello!
[02/07/17 17:45:18 INFO] GOLD SCORE: -14.9353
[02/07/17 17:45:18 INFO]
[02/07/17 17:45:18 INFO] SENT 2: how are you?
[02/07/17 17:45:18 INFO] PRED 2: i donāt know.
[02/07/17 17:45:18 INFO] PRED SCORE: -6.0676
[02/07/17 17:45:18 INFO] GOLD 2: iām good.
[02/07/17 17:45:18 INFO] GOLD SCORE: -12.0968
[02/07/17 17:45:18 INFO]
[02/07/17 17:45:18 INFO] SENT 3: whatās your name?
[02/07/17 17:45:18 INFO] PRED 3: i donāt know.
[02/07/17 17:45:18 INFO] PRED SCORE: -6.0925
[02/07/17 17:45:18 INFO] GOLD 3: iām julia.
[02/07/17 17:45:18 INFO] GOLD SCORE: -20.1079
[02/07/17 17:45:18 INFO]
[02/07/17 17:45:18 INFO] SENT 4: when were you born?
[02/07/17 17:45:18 INFO] PRED 4: donāt worry about it. i donāt know what to say.
[02/07/17 17:45:18 INFO] PRED SCORE: -16.6385
[02/07/17 17:45:18 INFO] GOLD 4: july 20th.
[02/07/17 17:45:18 INFO] GOLD SCORE: -19.1772
[02/07/17 17:45:18 INFO]
[02/07/17 17:45:18 INFO] SENT 5: what year were you born?
[02/07/17 17:45:18 INFO] PRED 5: not yet.
[02/07/17 17:45:18 INFO] PRED SCORE: -5.9572
[02/07/17 17:45:18 INFO] GOLD 5: 1977.
[02/07/17 17:45:18 INFO] GOLD SCORE: -6.7442
[02/07/17 17:45:18 INFO]
[02/07/17 17:45:18 INFO] SENT 6: where are you from?
[02/07/17 17:45:18 INFO] PRED 6: i donāt know.
[02/07/17 17:45:18 INFO] PRED SCORE: -5.9520
[02/07/17 17:45:18 INFO] GOLD 6: iām out in the boonies.
[02/07/17 17:45:18 INFO] GOLD SCORE: -18.8673
[02/07/17 17:45:18 INFO]
[02/07/17 17:45:18 INFO] SENT 7: are you a man or a woman?
[02/07/17 17:45:18 INFO] PRED 7: <unk>
[02/07/17 17:45:18 INFO] PRED SCORE: -5.2376
[02/07/17 17:45:18 INFO] GOLD 7: iām a woman.
[02/07/17 17:45:18 INFO] GOLD SCORE: -15.3107
ā¦
(omitted)
From this result, I donāt think that PPL guarantees how good the conversational model is.
I wonder if anyone successfully could reproduce googleās result.
Isnāt there any idea?
Thanks. I will give a try too - the fact that there is no answer that have anything to do with source is really weird - it seems that the only ability of the system is to generate reasonable sentence which shows on the PPL.
@jean.senellart
If you can see there is no answer, it is because of āunkā token with brackets.
In this post, bracket is used for blockquote mark.
So donāt put any meaning on no answer in response.
The real problem is unreasonable responses.
If this problem is solved through bigger dataset, it will be awesome.
And PPL is just like measurement score for language model, so it doesnāt give any score how reasonable the response is given a request.
BLEU can be used for machine translation, but Iām not sure.
Using lager dialog corpus would be a good option since google paper mentioned they used 62M sentences.
Due to your dataset(14M), I am now training again.
But I could not use 2 layers with 4096 LSTM because of āout-of-memoryā in GTX Titan(12G mem).
Letās see the result after several days.
Are there any online demos for any good-working models? iād like to try some queries. And i think we should release some good queries for testing ,so one can make a comparison between results of his model and othersā
Hi @jean.senellart
I have trained OpenNMT for chatbot.
Now to put this chatbot in production, I am unable to find the ways to do so.
One of the option which I found out was the Translation server option provided. But is it suitable for production use?
Also, can you please provide me the source code how to build an interface similar to http://chatbot.mysystran.com/
I am quite impressed by it and want a similar interface for my work.
I would be highly obliged if you could please help me with this.
Yes, it is suitable for light to moderate usage. If your requirements are higher than that, you should probably have the resources to build something on your own.
Except for the underlying OpenNMT technology, this demo is not open source.
Hi @guillaumekln,
thanks for the reply.
OK I understand that demo is not open source. At least can you tell me that u have used the same translation server as available to create your demo interface or have used something else?