EDD architecture - Encoder and dual decoder for image to text

Hi Everyone,

I’m trying to implement model to extract table structure. I came across good paper by IBM, in this paper they suggest to use one encoder which is CNN and dual decoder. One decoder for structure extraction which is generating HTML tag using image and another one m decoder for cell content extraction.

Can you guys help to implement EDD architecture using open NMT?
Thank you in advance.

Hi there,
A few pointers:

  • for data loading you can probably use either the existing image inputter, or the vector one;
  • for encoder / decoder, you can define those in onmt.encoders and onmt.decoders, based on the already existing ones;
  • you probably need to define a specific generator as well, making use of both decoder parts.

Thank you @francoishernandez for your response.

Can you point me to example link where i can refer to use existing encoder and decoder?
I tried image to text tutorial, but no coding was required :slight_smile:

It will be helpful if you point to some link where i can refer how to use onmt package.

Thank you in advance.

See the content of encoders (decoders) for example encoders (decoders).

For a quickstart on how to use the onmt library, you can refer to the docs.