Where is the input and output of CNN in image2text?

tintinkool · June 16, 2017, 5:49am

I try to modify Image2text model. I saw the CNN is defined in CNN.lua. But I did not see any code for input, output and forward/backward in CNN. Where is the code that give CNN input and get the output of CNN? Would you mind giving me more description about how it work?

thanks alot.
Anh

guillaumekln · June 16, 2017, 7:25am

I think you want to look here:

github.com

OpenNMT/Im2Text/blob/master/src/model.lua#L214


images = torch.rand(self.config.valBatchSize, 1, self.config.maxImageHeight, self.config.maxImageWidth)
targetInput = torch.IntTensor(self.config.valBatchSize, self.config.maxDecoderLength):fill(onmt.Constants.PAD)
targetOutput = torch.IntTensor(self.config.valBatchSize, self.config.maxDecoderLength):fill(onmt.Constants.PAD)
inputBatch = {images, targetInput, targetOutput, numNonzeros, {}}
self:step(inputBatch, true, 1, true)
torch.setRNGState(s)
_G.logger:info('Stress Test ends')
end


-- one step forward (and optionally backward)
function model:step(inputBatch, isForwardOnly, beamSize, mute)
mute = mute or false
beamSize = beamSize or 1 -- default greedy decoding
assert (beamSize <= self.config.targetVocabSize)
--local images = onmt.utils.Cuda.convert(inputBatch[1])
local images = inputBatch[1]:cuda()
local targetInput = onmt.utils.Cuda.convert(inputBatch[2])
local targetOutput = onmt.utils.Cuda.convert(inputBatch[3])
local numNonzeros = inputBatch[4]
local imagePaths = inputBatch[5]

tintinkool · June 16, 2017, 8:14am

Thanks for your answer. I read the Im2text paper, the input image is divided into 8x8 non-overlap windows. then CNN take features from 8x8 windows. So I don’t know how the code do that.
In tensorflow, they define some input and output tensor, is it different in torch?
The following statement is load the image data to cuda?
local images = inputBatch[1]:cuda()
Sorry for my simple question because I am beginner of torch.
Thanks.

guillaumekln · June 16, 2017, 3:01pm

You might want to get in touch with the main author for more details.

The window size should be defined in cnn.lua. Not sure the current code is actually using a 8x8 window though.

Yes.