Where is the input and output of CNN in image2text?

I try to modify Image2text model. I saw the CNN is defined in CNN.lua. But I did not see any code for input, output and forward/backward in CNN. Where is the code that give CNN input and get the output of CNN? Would you mind giving me more description about how it work?

thanks alot.

I think you want to look here:

Thanks for your answer. I read the Im2text paper, the input image is divided into 8x8 non-overlap windows. then CNN take features from 8x8 windows. So I don’t know how the code do that.
In tensorflow, they define some input and output tensor, is it different in torch?
The following statement is load the image data to cuda?
local images = inputBatch[1]:cuda()
Sorry for my simple question because I am beginner of torch.

You might want to get in touch with the main author for more details.

The window size should be defined in cnn.lua. Not sure the current code is actually using a 8x8 window though.


1 Like