First off, this is a great package and I have been able to build lots of interesting projects using it. Many thanks for that!
Secondly, I recently modified parts of the code in my local copy so that
translate.lua can produce two additional outputs:
Embeddings for input sentences; this is different from
tools/extract_embeddings.lua(which outputs word embeddings) in that, for a given model and input text, we output the final hidden state from the encoder that is subsequently conditioned on in the decoder. This allows us to cluster and analyze sentences/paragraphs using the produced vector representations.
I added a flag
-print_n_bestthat prints to STDOUT the n-best list and not just the 1-best translation. The
n_bestflag seems to make the decoder consider the n-best list during beam search and prints out the n-best outputs to STDERR, but not to STDOUT.
If there are easy fixes to the above two issues then that’s great. Otherwise, I can submit a pull request to add these features? One thing to note is that this is my first time writing Lua, and my guess is that the request will have to go through a few iterations before being accepted