Questions regarding get_fields() function of inputters/inputter.py

Hi,
I have a few questions about get_fields() function of inputters/inputter.py:

  • Why is ‘include_lengths’ True for source and not for the target? I understand that it is passed to torchtext.data.Field object but I am not sure how and where it is being used later?
  • Why are ‘bos’ and ‘eos’ added only to the target and not to the source?
  • What are the indices field and where and how are they used?

Any suggestion would be much appreciated.

Regards.

Hi,

  1. If I remember correctly, on the target side the length is inferred from the token ID. Because the source input can be non text data, the length has to be defined.
  2. BOS and EOS are special control symbols using during dynamic decoding which only concerns the target side.
  3. Not sure what you are referring.

Hi Guillaume,
Thanks for the information. By indices field, I was referring to the following in the get_fields() function of inputters/inputter.py:

indices = Field(use_vocab=False, dtype=torch.long, sequential=False)

Seems like these are needed by ‘Batch’ object during translation??