I’m looking into the accuracy of OpenNTM in the im2text problem (http://zh.opennmt.net/OpenNMT-py/im2text.html).
Since a pre-trained model exists, I downloaded it from http://lstm.seas.harvard.edu/latex/py-model.pt. Then I copied my test image into a directory (
in/), and created an input file listing the path within the directory (
Finally, I ran
translate.py with the following flags (I realise this is running on my CPU, but again, just testing it here):
python translate.py -data_type img -model py-model.pt -src_dir input -src input/manifest.txt -output input/pred.txt -max_length 500 -beam_size 5
However, this fails with the error:
RuntimeError: Given groups=1, weight of size 64 3 3 3, expected input[1, 4, 77, 424] to have 3 channels, but got 4 channels instead
Since I assume this is related to the image channels, I’ve tried
-image_channel_size 3, but that didn’t change anything. Next I tried converting the image to greyscale (with
convert input/1.png -set colorspace Gray -separate -average -alpha off input/1.grey.png) and re-running with
-image_channel_size 1, however this results in:
RuntimeError: Given groups=1, weight of size 64 3 3 3, expected input[1, 1, 77, 424] to have 3 channels, but got 1 channels instead
So basically I just can’t get the input channels right, so I have to assume I’m missing a key preprocessing step. The document does show the use of
preprocess.py, but this seems to cater only for training data, and it expects inputs like
tgt-train.txt that I obviously don’t have for my test file.
What preprocessing needs to be run on my file
1.png in order for OpenNMT to work?