Hi everyone, I am trying to add biased decoding to the model, but I am unsure of how to do this. Basically, my model is trying to add punctuation to the source sentences, so the only ‘words’ I want it to output is either the original word, or let’s say a comma (for simplicity sake). Therefore, is there any way to bias decoding such that the only words that the model considers are the original word and the comma, and nothing else?
By the way, this is how it was done on TensorFlow: http://atpaino.com/2017/01/03/deep-text-correcter.html
However, I am not that familiar with Lua and Torch yet, so I’m not sure how this can be translated onto this model.
Oh yes, this is something that we are interested in as well.
The new beam search should make this possible, although you will have to modify the code.
In particular see this function:
It will allow you to return a (batchSize * beamSize)
tensor at each step of beam search. If you return false on “bad” outputs it will ignore them.
Hello, thanks for your response! I tried to do what you suggested for a looong time, but haven’t really gotten anywhere as I really can’t understand the code. Let’s say I only want it to output ‘a’ or ‘b’, which have index number of 1 and 2 respectively in the dict. How do I check if the token (which I understand is a tensor?) is equal to either ‘a’ or ‘b’?
Also, let’s say the original sentence is ‘c d e f’. Is the token the predicted word or the original word? And if it is the predicted word, is there a way to access the original word/sentence data?