How to use Image to Text functionality in OpenNMT-py?

I have an image captioning dataset for which I want to train OpenNMT-py model to complete the image captioning task. I can’t find any guidelines in the documentation about this task.
There is some explanation/code about this is given but for legacy version on the following link.
Can I perform Image to Text in OpenNMT-py latest version or I need to install the legacy version from the source to use this functionality?

Thank you for your help!

These features were not ported in v2. We kept the legacy doc for anyone who would still like to try it out.

Thank you. I will try to use/install it vis source on GitHub with 'legacy branch.

You can also install v1.2 with pip if you prefer.

Hello, can I share your image to text code? I am a beginner and encountered some problems