I’m looking into the accuracy of OpenNTM in the im2text problem (http://zh.opennmt.net/OpenNMT-py/im2text.html).
Since a pre-trained model exists, I downloaded it from http://lstm.seas.harvard.edu/latex/py-model.pt. Then I copied my test image into a directory (in/
), and created an input file listing the path within the directory (manifest.txt
).
Finally, I ran translate.py
with the following flags (I realise this is running on my CPU, but again, just testing it here):
python translate.py -data_type img -model py-model.pt -src_dir input -src input/manifest.txt -output input/pred.txt -max_length 500 -beam_size 5
However, this fails with the error:
RuntimeError: Given groups=1, weight of size 64 3 3 3, expected input[1, 4, 77, 424] to have 3 channels, but got 4 channels instead
Since I assume this is related to the image channels, I’ve tried -image_channel_size 3
, but that didn’t change anything. Next I tried converting the image to greyscale (with convert input/1.png -set colorspace Gray -separate -average -alpha off input/1.grey.png
) and re-running with -image_channel_size 1
, however this results in:
RuntimeError: Given groups=1, weight of size 64 3 3 3, expected input[1, 1, 77, 424] to have 3 channels, but got 1 channels instead
So basically I just can’t get the input channels right, so I have to assume I’m missing a key preprocessing step. The document does show the use of preprocess.py
, but this seems to cater only for training data, and it expects inputs like tgt-train.txt
that I obviously don’t have for my test file.
What preprocessing needs to be run on my file 1.png
in order for OpenNMT to work?