EDD architecture - Encoder and dual decoder for image to text

Sharathmk99 · July 15, 2020, 10:49pm

Hi Everyone,

I’m trying to implement model to extract table structure. I came across good paper by IBM https://arxiv.org/pdf/1911.10683, in this paper they suggest to use one encoder which is CNN and dual decoder. One decoder for structure extraction which is generating HTML tag using image and another one m decoder for cell content extraction.

Can you guys help to implement EDD architecture using open NMT?
Thank you in advance.

francoishernandez · July 16, 2020, 12:31pm

Hi there,
A few pointers:

for data loading you can probably use either the existing image inputter, or the vector one;
for encoder / decoder, you can define those in onmt.encoders and onmt.decoders, based on the already existing ones;
you probably need to define a specific generator as well, making use of both decoder parts.

Sharathmk99 · July 16, 2020, 10:24pm

Thank you @francoishernandez for your response.

Can you point me to example link where i can refer to use existing encoder and decoder?
I tried image to text tutorial, but no coding was required

It will be helpful if you point to some link where i can refer how to use onmt package.

Thank you in advance.

francoishernandez · July 17, 2020, 7:10am

Here: https://github.com/OpenNMT/OpenNMT-py/tree/master/onmt
See the content of encoders (decoders) for example encoders (decoders).

For a quickstart on how to use the onmt library, you can refer to the docs.