Pre-trained embeddings : what about special tokens?

Etienne38 · February 21, 2017, 8:07am

When using pre-trained embeddings, what should be done with special tokens ?
<blank> <unk> <s> </s>

I suppose I have to add <s> and </s> tokens to my sentences when training my own embeddings ?

It seems that DL4J doesn’t save any <unk> token in its word-vectors files or something similar. Is this token supposed to be always a null vector ?

Do I also need to build a special treatment to search and add <unk> and <blank> to the input sentences, to be able to properly build a vector in the result set for them ?