I have commited tools for language models - you can now use
lm.lua to score and sample corpus with language models. It supports text with features.
To try it:
th preprocess.lua -data_type monotext -train train.tok -valid valid.tok -save_data datalm
if you use features, activate
th lm.lua score -model model.t7 -src t
output.txt with perplexity by sentence
- sample with a language model:
th lm.lua sample -model model.t7 -src t [-temperature T] [-max_seq_len L]
where t contains prefix (seed) of sentences - can be empty. Random sentences up to L words will be generated. The parameter
T between 0-1 governs how random should be the generation. The closest to zero, the more consistent and less random.
Any feedback is welcome, and as a teaser, I trained a LM on the corpus from:
and the LM can generate random questions that the chatbot will answer ! Just for the fun a sample:
LM@Bot: Where is my love?
Bot@LM: Tell me.
LM@Bot: Has Ken come back yet?
Bot@LM: no, not yet
LM@Bot: WHATS HITCHKOK EXACTLY?
Bot@LM: I DON'T KNOW.
LM@Bot: Where do you put eyes on your bitter sack then walsh?
Bot@LM: I don't.
LM@Bot: Did you and Rosemary know each other?
LM@Bot: Can I be of some assistance?
Bot@LM: no, I'm fine.
LM@Bot: who's slow?
Bot@LM: I'm slow.
LM@Bot: what the hell did dave want?
Bot@LM: I don't know.
LM@Bot: Did she say I needed money?
LM@Bot: what the hell have you been up to?
LM@Bot: Hey, anybody home?
LM@Bot: Where have Magi lived?
Bot@LM: I'll tell you.