I noticed that some tokenized corpora have tags that others don’t (for example some pt corpora only have “hun” and “id” tags while some es corpora have “tree”, “lem”, “id” and “svmtool”), so is it safe to assume that it’s possible to add any (coherent) tags to tokenized corpora to enhance the training process? My idea is adding something like a “context” tag, since a single word can have multiple meanings depending on the context. Would that be possible? If so, would it make a significant difference when translating?
Thanks in advance