"Appropriate" way to translate from NOT a file

juanshlm · December 1, 2017, 8:47pm

I’m trying to use ONMT-py from within a small app. In previous versions (i’m talking commit 26421ce20c6b626ceacafbb3282cad1d5dce04ca ) i could simply do as follows:

def translate(input_statement):
    translator = onmt.Translator(translator_args)
    response, score_raw, _, _, _ = translator.translate(srcBatch=[input_statement.text.split()],
                                                                              goldBatch=[])
    return response

And I would be able to get a translation from a string.

Now I’m trying to do the same with the latest version in the repo, and I’m not sure what’s the best way to not depend upon a file, since the Tranlator.translate() method uses an object batch, which comes from an onmt.IO.OrderedIterator, which comes from an onmt.IO.ONMTDataset which requires a file to load the source from.

Is there a simpler way to get to that same object so i can use the translate() method that does not involve reading from a file? What am i missing?

Or an easy way to generate the ONMTDataset object from just a string? Seems a bit convoluted, but as far as i understood the structure, it’s either that or creating a custom translate().

I did look for documentation and searcherd the forum, but couldn’t find anything.

Thanks!

JianyuZhan · December 4, 2017, 8:42am

Hi, your proposed method would be only feasible (but yes, rather cumbersome) way now.

Previously, there are users asking for support of translating a single sentence, we thought
it is easy to fake it as a file to use current interface, so we don’t add that support.

But your requirement seems reasonable, maybe later when I refactor the translate part of
code, I will implement this.

Thanks.

juanshlm · December 4, 2017, 8:42pm

For whatever it’s worth (maybe it’ll help someone else), this is what i’m doing:

I created a class from ONMTDataset with only what i needed:

class StringONMTDataset(ONMTDataset):
    def __init__(self, src_path, fields):
        self.src_vocabs = []
        self.n_src_feats = 0
        self.n_tgt_feats = 0
        src_base = [{"src": tuple(src_path.split()),
                     "indices": 0}]
        src_examples = (x for x in src_base)
        self.n_src_feats = 0
        examples = src_examples

        examples = self.dynamic_dict(examples)

        ex, examples = peek(examples)
        keys = ex.keys()

        fields = [(k, fields[k]) for k in keys]
        example_values = ([ex[k] for k in keys] for ex in examples)
        out_examples = (torchtext.data.Example.fromlist(ex_values, fields)
                        for ex_values in example_values)

        super(ONMTDataset, self).__init__(
            out_examples,
            fields,
            None
        )

And having that, i just initialize the object:

line = "Whatever i need to translate"
data = StringONMTDataset(line, translator.fields)
test_data = onmt.IO.OrderedIterator(
    dataset=data, device=translator_args.gpu, batch_size=translator_args.batch_size, 
    train=False, sort=False, shuffle=False)
test_data.create_batches()
prepared_string = torchtext.data.Batch(test_data.batches[0], data, 
                             translator_args.gpu, False)
pred_batch, gold_batch, pred_scores, gold_scores, attn, src \
    = translator.translate(prepared_string, data)

Maybe not too clean, but it works…

ugy · March 10, 2018, 10:14pm

Hi, I am wondering if there is any new progress on this problem. I spend some time, but no success. @juanshlm solution does not work anymore because of the updates in the project. I tried to adapt it but it does not work yet. Any help will be appreciated? I believe this can be an important future for anyone who will try to use project in a demo application rather than experiments. Thanks

srush · March 11, 2018, 3:19am

Agreed. We will try to support this better.

ugy · March 12, 2018, 3:13pm

Thanks for the update. I am looking forward to it.