When using pre-trained embeddings, what should be done with special tokens ?
<blank> <unk> <s> </s>
I suppose I have to add <s>
and </s>
tokens to my sentences when training my own embeddings ?
It seems that DL4J doesn’t save any <unk>
token in its word-vectors files or something similar. Is this token supposed to be always a null vector ?
Do I also need to build a special treatment to search and add <unk>
and <blank>
to the input sentences, to be able to properly build a vector in the result set for them ?