I recently read your paper and it mentions you have implemented William Chan’s paper - Listen, Attend and Spell (https://arxiv.org/abs/1508.01211) in the PDBiEncoder.lua model. I would like to use this encoder on a speech dataset like WSJ.
Could you please point me to some documentation describing required parameters and call to the preprocess, train and translate files? OpenNMT paper says to replace the encoder by the PDBiEncoder but I am unable to understand the internal feature representation that train.lua obtains from the text data (src and tgt), and hence am unable to replace it with speech features.
Looking forward to your replies!