Have you tried to build your models using any subword segmentation (bpe, morfessor)?
I guess that using that you will see the out-ouf-vocabulary problem reduced.
However, I think you should adapt your phrase table to the subword segmentation too, because the attention module (the one which is generating the soft alignments) now will be working at subword level.
Also, if you use that, don't forget to include a little post-process step to reattach the words. Notice that the input will have the form 'this is a sen* ten* ce pro* of'
and the output will be something similar to, for instance,
'est* o es un* a fras* e de prueb* a' in Spanish.
Additionally, I would suggest you to implement a little bit more sophisticated post-process in order to detect those weird words created when reconstructing the subword translation, maybe you can use an extended dictionary or a phrase table to do so.
Notice that now the translation errors will be made at subword level. So, if you have as input 'this is a sen* ten* ce pro* of' the system can produce
'est* o es un* alg* un* a prueb* a'
which attached results in: 'esto es unalguna prueba'
producing the word 'unalguna' which does not exist in Spanish.
Regarding to numbers, as @guillaumekln told you here:
you can pre-process the training data to use placeholders for numbers and, after decoding, just substitute them for the corresponding source number (this may be the simplest approach). Remember that if you use this technique, the input of your system must have the placeholders for numbers too.
I hope that can help you