I am training over a small vocabulary, and looking for params I can adjust for that case.
Basic question is for these two params: do they represent expected maximum length of source and target sequences in terms of tokens?
(optional) The maximum length of feature sequences during training (default: None).
maximum_features_length: 70
(optional) The maximum length of label sequences during training (default: None).
maximum_labels_length: 70