Using CTranslate2 per se will give a different BLEU than the PyTorch model, including every option you add. “Different” does not always mean worse; even if it would, the difference is insignificant.
hello I can’t convert my model to ctranslate2:
ct2-opennmt-py-converter --model_path averaged-10-epoch.pt --output_dir ende_ctranslate2 --quantization int8
but I still have errors
yes it was the model in the tutorial, but I downloaded another already converted model which is in your github on the desktop software project but I got this error:
(mon_env) PS C:\Users\MOI\Documents\Openmt\CTranslate-NMT-Web-Interface> streamlit run translate.py
You can now view your Streamlit app in your browser.
2023-12-30 15:45:23.353 Uncaught app exception
Traceback (most recent call last):
File “C:\Users\MOI\Documents\Openmt\mon_env\Lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py”, line 534, in _run_script
exec(code, module.dict)
File “C:\Users\MOI\Documents\Openmt\CTranslate-NMT-Web-Interface\translate.py”, line 35, in
translator = ctranslate2.Translator(ct_model_path, “cpu”) # or “cuda” for GPU
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Unable to open file ‘model.bin’ in model ‘/model/ct2_model’
If you think it is the model, then you can try the NLLB model (it requires a different way to translate, though). However, as you use Windows, I think this path might be incorrect (maybe you need to add r before the path). I am afraid I do not have Windows here to double-check. You can try a very simple code like opening a text file you created yourself, and see how it works.
now it works, I installed ubuntu and redid the project.
Now I’m going to start learning how to make models and cover them in ctranslate to have more personalized models.
and thank you again for your tutorial
Hi @ymoslem thanks for your tutorialm a¡nd also thanks the comments from other people. Based on that I also have posted a MT system demo for ENG<>FRE using cleaned corpora from UN and European Community corpora getting results as good (and probably better) as Google (or DeepL) .
I use my own tokenizer (no explanation), but again, it in IMHO, it prorbably the fastest way to deploy something you could share, with others.
Thank you for sharing the codes. I am new in NMT field and need to clear my confusion. Please help.
After successfully training English to Hindi and obtaining the *.pt file, I want to build a translator webpage using Flask or Streamlit.
After reading the discussions, I have come to the following conclusions:
The *.pt file should be converted using ctranslate2 to get the model.bin file.
I should have SentencePiece (spm) models for both the target and source languages.
My questions are:
How can I build the spm model if I only have the *.pt file?
What is the significance of the onmt_build -config command and the files it generate?
What is the difference between the spm's*.vocab file and the onmt_build -config command’s *.vocab file?
According to the code provided, we should feed both source.model and the target.model, but the pretrained models on ONMT’s page only have a singlesentencepiece.model (file for transformers). I am confused and can’t move forward. Could you please clear my confusion(s)?
If you don’t have a SentencePiece model, then probably you have followed a different tokenisation approach.
If a pre-trained model has only one SentencePiece model, then it uses shared vocabulary between the source and target. This means the same model can be used for tokenisation of both the source and target.