Just in case some people wonder, the new “open-weight” LLM mistral-7B is fully compatible with our llama(2)-7B support.
You use the llama converter to convert it to OpenNMT-py format. (download the checkpoint from the mistral website)
You can finetune, infer the same way as llama(2)-7B
The only slight difference is the so-called “sliding window” for the attention mask but it does not impact any sequence shorter than 4096 tokens which is (for most use cases) higher than finetuning, datatsets (alpaca, vicuna, oasst) or eval dataset like mmlu.
We will still add a new option to support this sliding window but the impact is minimal if not nil.
NB1: the mmlu score I got is 61 (close to 60ish of their report) which very high for a 7B and raises some questions about potential data contamination. In any case the model is very good and since it is 100% compatible with Llama it means the difference comes from the training datasets (they did not release any information)
NB2: if you intend to use mistral-7B instruct with Ctranslate2, it should also work out-of-the-box with the llama converter. (for this use the huggingface format checkpoint)