OpenNMT v0.9 release

guillaumekln · November 7, 2017, 4:09pm

This was a long-awaited release.

Here are the main new features with pointers to get started in using them:

Dynamic dataset

This feature enables new training approaches by removing the preprocessing step. You can now store all the training data in a directory and define patterns to match filenames to be used during the training. The training will randomly sample sentences according to the weight assigned to each pattern and tokenizes them on-the-fly.

This allows both working with a larger training set and fine-tuning the domains distribution of the selected examples.

More information can be found in the documentation.

New tokenization features

As tokenization can now be applied on-the-fly, more features are needed to cover some specific use cases:

New special characters to prevent the tokenization of blocks.
The tokenization is now able to call external normalization scripts.

Advanced decoding

Some new decoding techniques have been added:

The inference now support the shallow fusion of language model. The feature is for example used to replicate the “Listen, Attend and Spell” paper.
The beam search now has new options to constrain its search lexically. This could be useful when working with placeholders that must appear on the target.

Multi-model REST server

This new server supports serving translation from multiple models to cover more advanced use cases. See the related documentation for more details.

New retraining behavior

Keeping the same vocabularies was a requirement to re-train a model (e.g. for domain adaptation). A new option -update_vocab now relaxes this constraint and offer some policies to update the vocabularies used in the initial training with the ones defines by the new dataset.

Fixes and improvements

As usual, this release comes with bugfix and improvements. See the changelog below for a complete list.

Thanks to contributors, bug reporters, and people testing and giving feedback. If you find a bug introduced in this release, please report it.

github.com

OpenNMT/OpenNMT/blob/master/CHANGELOG.md#v090-2017-11-07

## [Unreleased]

### Breaking changes

### New features

* Introduce hook mechanism for additional customization of workflows
* Sentence-level negative log-likelihood criterion for sequence tagging
* '-' stands for stdin for inference tools (translate, lm, tag)

### Fixes and improvements

* Integrate hooks with rest translation server
* Fix integration of hook with `learn_bpe` (#456)

## [v0.9.7](https://github.com/OpenNMT/OpenNMT/releases/tag/v0.9.7) (2017-12-19)

### Fixes and improvements

* Fix detokenization when replaced target tokens contain spaces

This file has been truncated. show original

jean.senellart · November 9, 2017, 4:14pm

9 posts were split to a new topic: Problem when installing new version after using luarocks install on an older version

mhdave2 · December 3, 2017, 7:48pm

Is OpenNMT-py in sync with this release? How do we find the changelog for opennmt-py?

guillaumekln · December 4, 2017, 8:03am

This only applies to OpenNMT (the LuaTorch version).

For OpenNMT-py, you can directly see the GitHub history for recent changes but there is no versioning workflow in place.