When using pre-trained embeddings, what should be done with special tokens ?
<blank> <unk> <s> </s>
I suppose I have to add
</s> tokens to my sentences when training my own embeddings ?
It seems that DL4J doesn’t save any
<unk> token in its word-vectors files or something similar. Is this token supposed to be always a null vector ?
Do I also need to build a special treatment to search and add
<blank> to the input sentences, to be able to properly build a vector in the result set for them ?