Translate_batch(): incompatible function arguments. for ctranslate2

aquorio15 · January 17, 2023, 2:39pm

I got this piece of code from @ymoslem and i was trying to run my model on it. bUt i am geeting this error of incompatible function arguments

import ctranslate2

import sentencepiece as spm
source_file_path = "demo.txt"

target_file_path = "1.txt"

sp_source_model_path = "/content/drive/MyDrive/MT/Eng-Lam/vocab.src.model"

sp_target_model_path = "/content/drive/MyDrive/MT/Eng-Lam/vocab.tgt.model"

ct_model_path = "/content/drive/MyDrive/MT/Eng-Lam/model"
sp = spm.SentencePieceProcessor()

sp.load(sp_source_model_path)

with open(source_file_path, "r") as source:

lines = source.readlines()

source_sents = [line.strip() for line in lines]
source_sents_subworded = sp.encode(source_sents)
translator = ctranslate2.Translator(ct_model_path, device="cpu") # or "cuda" for GPU
translations = translator.translate_batch(source_sents_subworded, batch_type="tokens", max_batch_size=4096)
translations = [translation[0]['tokens'] for translation in translations
sp.load(sp_target_model_path)
translations_desubword = sp.decode(translations)
with open(target_file_path, "w+", encoding="utf-8") as target:
for line in translations_desubword:
target.write(line.strip() + "\n")
print("Done")

and this is the error i am getting

     27 # Translate the source sentences
     28 translator = ctranslate2.Translator(ct_model_path, device="cpu")  # or "cuda" for GPU
---> 29 translations = translator.translate_batch(source_sents_subworded, batch_type="tokens", max_batch_size=4096)
     30 translations = [translation[0]['tokens'] for translation in translations]
     31 

TypeError: translate_batch(): incompatible function arguments. The following argument types are supported:
    1. (self: ctranslate2._ext.Translator, source: List[List[str]], target_prefix: Optional[List[Optional[List[str]]]] = None, *, max_batch_size: int = 0, batch_type: str = 'examples', asynchronous: bool = False, beam_size: int = 2, num_hypotheses: int = 1, length_penalty: float = 1, coverage_penalty: float = 0, repetition_penalty: float = 1, no_repeat_ngram_size: int = 0, disable_unk: bool = False, suppress_sequences: Optional[List[List[str]]] = None, end_token: Optional[str] = None, prefix_bias_beta: float = 0, max_input_length: int = 1024, max_decoding_length: int = 256, min_decoding_length: int = 1, use_vmap: bool = False, return_scores: bool = False, return_attention: bool = False, return_alternatives: bool = False, min_alternative_expansion_prob: float = 0, sampling_topk: int = 1, sampling_temperature: float = 1, replace_unknowns: bool = False) -> Union[List[ctranslate2._ext.TranslationResult], List[ctranslate2._ext.AsyncTranslationResult]]

Invoked with: <ctranslate2._ext.Translator object at 0x7f6c52bd3170>, [[1299, 131, 8, 151, 3131, 9, 34, 1960, 31, 4, 206, 17, 1656, 3263, 5, 391, 34, 257, 3]]; kwargs: batch_type='tokens', max_batch_size=4096

Can anyone help me with this

ymoslem · January 18, 2023, 12:25am

Hi Amartya,

This should be either sp.encode(source_sents, out_type=str) or sp.encode_as_pieces(source_sentences)

In the past, I had this as "str" which was supported; now it should be str without quotes. While trying to fix it, you might have deleted the out_type option altogether, which converted the text into IDs rather than tokens.

In addition, please note that currently CTranslate2 has a new way to retrieve tokens. So this line will be rather updated as:

translations = [translation.hypotheses[0] for translation in translations]

I fixed the code here. If it is mentioned elsewhere, please let me know, and I will fix it. Thanks!

gist.github.com

https://gist.github.com/ymoslem/60e1d1dc44fe006f67e130b6ad703c4b

CTranslate2-example.py

# First convert your OpenNMT-py or OpenNMT-tf model to a CTranslate2 model.
# pip3 install ctranslate2
# • OpenNMT-py:
# ct2-opennmt-py-converter --model_path model.pt --output_dir enja_ctranslate2 --quantization int8
# • OpenNMT-tf:
# ct2-opennmt-tf-converter --model_path model --output_dir enja_ctranslate2 --src_vocab source.vocab --tgt_vocab target.vocab --model_type TransformerBase --quantization int8


import ctranslate2
import sentencepiece as spm

This file has been truncated. show original

Kind regards,
Yasmin