I recently read your paper and it mentions you have implemented William Chan’s paper - Listen, Attend and Spell (https://arxiv.org/abs/1508.01211) in the PDBiEncoder.lua model. I would like to use this encoder on a speech dataset like WSJ.
Could you please point me to some documentation describing required parameters and call to the preprocess, train and translate files? OpenNMT paper says to replace the encoder by the PDBiEncoder but I am unable to understand the internal feature representation that train.lua obtains from the text data (src and tgt), and hence am unable to replace it with speech features.
Looking forward to your replies!
Implementation is not final yet.
But you can look at this PR: https://github.com/OpenNMT/OpenNMT/pull/168
we plan to finalize tests and integrate the PR next week.
I’ve noticed that the predefined models in OpenNMT-ft include an implementation of the ListenAttendSpell paper. Is this a finalised implementation? Where can I find the documentation to use it?
Thanks a lot!
The implementation is complete but was not heavily tested. There are few differences with other models in terms of requirements. The key difference is that the source input must be a
TFRecords file. There is a section in the documentation which explain how to generate this file: http://opennmt.net/OpenNMT-tf/data.html#vectors