How to go about recreating a basic seq2seq model

geckuba · November 16, 2020, 9:28pm

Hi everyone,

Tl;dr
What is the best way (or a tutorial, perhaps) to reproduce an opennmt-py implementation of a particular seq2seq model?

I have used opennmt-py for some morphological processing, specifically, I’ve used 2 layer LSTM for both encoder and decoder with global attention.

I am now trying to reproduce this model in “raw” pytorch to make some changes on the encoder side (combine character and word embeddings).
I realize that opennmt might very well support embeddings combinations (at least opennmt-tf does), but I also want to try this as an exercise.

The problem is that my implementation yields much lower accuracy (about 8% net decrease). I am using Bahdanau attention instead of global (Luong) attention I used in opennmt, but I doubt that this fact produces such underperformance. I’m clearly doing something wrong

So I thought maybe I could tweak the opennmt code or, better yet, somehow write “my own onmt” that just models 2 layer LSTM enc/dec with global attention. I have been looking at the opennmt-py code and do have some specific questions, but before asking them, I wonder if there is some general advice.