OpenNMT v0.9 release

This was a long-awaited release. :slight_smile:

Here are the main new features with pointers to get started in using them:

Dynamic dataset

This feature enables new training approaches by removing the preprocessing step. You can now store all the training data in a directory and define patterns to match filenames to be used during the training. The training will randomly sample sentences according to the weight assigned to each pattern and tokenizes them on-the-fly.

This allows both working with a larger training set and fine-tuning the domains distribution of the selected examples.

More information can be found in the documentation.

New tokenization features

As tokenization can now be applied on-the-fly, more features are needed to cover some specific use cases:

Advanced decoding

Some new decoding techniques have been added:

Multi-model REST server

This new server supports serving translation from multiple models to cover more advanced use cases. See the related documentation for more details.

New retraining behavior

Keeping the same vocabularies was a requirement to re-train a model (e.g. for domain adaptation). A new option -update_vocab now relaxes this constraint and offer some policies to update the vocabularies used in the initial training with the ones defines by the new dataset.

Fixes and improvements

As usual, this release comes with bugfix and improvements. See the changelog below for a complete list.


Thanks to contributors, bug reporters, and people testing and giving feedback. If you find a bug introduced in this release, please report it.

7 Likes

9 posts were split to a new topic: Problem when installing new version after using luarocks install on an older version

Is OpenNMT-py in sync with this release? How do we find the changelog for opennmt-py?

This only applies to OpenNMT (the LuaTorch version).

For OpenNMT-py, you can directly see the GitHub history for recent changes but there is no versioning workflow in place.