Variable size features

Implement a different style of (input-only) features that allow variable number of additive features per word.

What is the use case of this? I believe it will also be quite tricky to implement.

I think we can always come back to a fixed number of features by using a dummy label. For example with the case, the N label is used to mark tokens where the case does not apply.

I wanted to implement subword embeddings by defining multiple ngram features per source word.