Beam search understanding

ZexCeedd · April 18, 2017, 2:19am

Hi, having difficulties on modifying the beam search so it will only output unique terms, meaning that if the first word generated is “hi”, it will never output “hi” again, such that it will consider the 2nd highest candidate instead for the beam if “hi” is first.

Tried looking at advancer.lua and others but doesn’t seem straightforward to achieve this.

Also, if my prediction outcome has a max length of 21.
Does this mean my beam search should be 21? But it won’t run and show out of memory.

guillaumekln · April 18, 2017, 8:03am

Hi,

Here you have the scores outputted by the model before normalization:

github.com

OpenNMT/OpenNMT/blob/master/onmt/translate/Beam.lua#L381


  local cumAttnProba = self._state[8]:view(self._remaining, scores:size(2), -1)
  local coveragePenalty = normalizeCoverage(cumAttnProba)


  if (scores:nDimension() > 2) then
    coveragePenalty =  coveragePenalty:expand(scores:size())
  else
    coveragePenalty = coveragePenalty:viewAs(scores)
  end
  normScores = torch.add(normScores, coveragePenalty)
end


return normScores


end




-- Given new scores, combine that with the previous total scores and find the
-- top K hypotheses to form the next beam.
function Beam:_expandScores(scores, beamSize)
local remaining = math.floor(scores:size(1) / beamSize)
local vocabSize = scores:size(2)

You could bias this Tensor of size [batch x beam x vocab] to force or prevent the prediction of a set of words.

The beam size is not related to the output length. The prediction stops when the top beam predicts the end of sentence token (you can bias that too) or when the length is longer than -max_seq_length.

vito.mandorino · April 19, 2017, 7:51am

I don’t fully understand the scope of the pre-filter-factor option, but maybe this is something that could be used to discard translation hypotheses based on a customised filter?

guillaumekln · April 21, 2017, 10:23am

If the hypotheses filtering is aggressive you may want to consider more hypotheses before applying it by using the -pre_filter_factor option. This way, after the filter you can still work with -beam_size hypotheses.