EDD architecture - Encoder and dual decoder for image to text

Hi Everyone,

I’m trying to implement model to extract table structure. I came across good paper by IBM https://arxiv.org/pdf/1911.10683, in this paper they suggest to use one encoder which is CNN and dual decoder. One decoder for structure extraction which is generating HTML tag using image and another one m decoder for cell content extraction.

Can you guys help to implement EDD architecture using open NMT?
Thank you in advance.

Hi there,
A few pointers:

  • for data loading you can probably use either the existing image inputter, or the vector one;
  • for encoder / decoder, you can define those in onmt.encoders and onmt.decoders, based on the already existing ones;
  • you probably need to define a specific generator as well, making use of both decoder parts.

Thank you @francoishernandez for your response.

Can you point me to example link where i can refer to use existing encoder and decoder?
I tried image to text tutorial, but no coding was required :slight_smile:

It will be helpful if you point to some link where i can refer how to use onmt package.

Thank you in advance.

Here: https://github.com/OpenNMT/OpenNMT-py/tree/master/onmt
See the content of encoders (decoders) for example encoders (decoders).

For a quickstart on how to use the onmt library, you can refer to the docs.