A new big release for OpenNMT!
Input vectors support
OpenNMT now supports arbitrary vectors as inputs using the Kaldi text format on the source side. You could use this feature to create text-to-speech model using OpenNMT for example.
Pretrained word embeddings
tools/embeddings.lua script was added to improve the experience of using pretrained embeddings. It can generate word embeddings w.r.t. to your prepared vocabulary from an online repository or from pretrained files (word2vec, GloVe, fastText).
Additionally, the model word embeddings size can now be larger than the pretrained one and you can choose to only fix the pretrained part.
Bridge layer between the encoder and decoder
This new layer allows you to define how the encoder states are passed to the decoder (copy, linear projection, none). Except with the copy operation, the encoder and decoder can now have a different number of layers using the
In addition to data sampling, importance sampling is a technique to reduce the target vocabulary based on the current data sample and improve performance.
Better options parsing
Whether you are using the command line or configuration files, the option parser now generates more helpful error messages and also supports new usages:
- list of values can be space-separated instead of comma-separated
- boolean options now accepts
trueas arguments or nothing as before (option flag)
As always several bugs were fixed thanks to user reports and extended automated tests. See the release note below for a complete list.
The complete release note is also available in the repository: