OpenNMT Forum

Are composite tokens possible?

For seq2seq translation, instead of having tokens composed of multiple characters be unique tokens, is it possible for every token to be 1 character in length, and the target data to output more than 1 character at a time.

Essentially, if the seq2seq ordinarily outputs a 1 hot vector (with 1 corresponding to whichever token is predicted), is it instead possible for it to output a vector containing multiple ones?)

How would you know the order in which the characters appear?