Implement a different style of (input-only) features that allow variable number of additive features per word.
What is the use case of this? I believe it will also be quite tricky to implement.
I think we can always come back to a fixed number of features by using a dummy label. For example with the case, the N
label is used to mark tokens where the case does not apply.
I wanted to implement subword embeddings by defining multiple ngram features per source word.