Question on word vector initialization on token _pad, _unknown, _go, _eos

largemaths · January 26, 2017, 8:34pm

Thank you guys for the awesome job on openNMT and seq2seq-Attention implementation!

May I ask for the initialization strategy of the word tokens :_pad, _unknown, _go, _eos.
It’s not trained through word embedding an I wonder what initialization works the best for you?
Thank you for your time!

guillaumekln · January 27, 2017, 8:48am

Hi,

The padding embedding can be set to the zero vector. For the others, we did not experiment various initializations. The default initialization of the LookupTable is a normal distribution for embeddings, so you could just do the same when you prepare your embeddings.

github.com

torch/nn/blob/master/LookupTable.lua#L54


return self
end


function LookupTable:scaleGradByFreq()
self.shouldScaleGradByFreq = true
return self
end


function LookupTable:reset(stdv)
stdv = stdv or 1
self.weight:normal(0, stdv)
end


function LookupTable:makeInputContiguous(input)
-- make sure input is a contiguous torch.LongTensor
if (not input:isContiguous()) or torch.type(input) ~= torch.type(self._input) then
   self.copiedInput = true
   self._input:resize(input:size()):copy(input)
   return self._input
end
self.copiedInput = false