Access the "keys" of data that are in a batch

Hi,

Could you please tell me how I can access the “keys” of all the data entries that are within a particular batch? I am using the ‘feattext’ data type with -idx_files. I can access the keys during preprocessing, but during training, how do I do it?

Thank you!

Hello,

The “keys” are just used to align entries and this information is not stored. What is your use case?

Hi,

I want to append more features at the end of the encoder (to encStates variable) and pass it into the decoder. That is why I am trying to access the “keys” of the features in a particular batch. I am trying different adaptation strategies when we have extra features for speech recognition. (If the word features could work for the “feattext” data type as well, it would be really great! Please look into it.)

I tried adding the extra features that I have during preprocess.lua to the “data” variable itself along with gVectors and gFeatures, but I am stuck when trying to access the new features in Seq2Seq.lua. There it arrives as a batch, and within Batch.lua, I am not sure how I can access the new features.

What would the best way be for me if I want to add more features at the end of the encoder?

This is supported in OpenNMT-tf with the ParallelInputter. Basically you can feed 2 files as inputs and concatenate the input representation in depth (for example, your speech vectors and a word embedding).

The new workflow may be a bit confusing for long time OpenNMT users but I would be happy to assist. See the documentation and existing models.

Thanks a lot for your reply for this question, Guillaume. I have started working with OpenNMT-tf and I am excited to explore the many new possibilities it presents. Thanks for this version!