I am using opennmt-tf . After training i am geting checkpoints as ckpt-1.data-00001-of-00002
how to load the checkpoints for infer as its introducing the error while loading command for training and infer are
training !onmt-main --model_type NMTMediumV1 --auto_config --config /content/data.yml train --num_gpus 1
I have few things to ask if you can help me
We are building es-en nmt production ready engine -
1 Is there any pretrained model available .
2 in opennmt-tf how to handle unk token to original source token while doing translation
foreg if some word not exist while translating it get replaced with unk token and after that its should not assign a random translation to that token and make it as a source token in the translated text
3 can you help me in defining the optimum configuration for es-en for training the model and which model you suggest with how many steps
4 how to text inference for single query instead of passing file .
Hope you will help me in making a good nmt engine as i have already tried different architecture but still not get production ready environment
can we use BPE for handling the rare words ? . but suppose if some numeric number is passed to engine is there a way to mask that so that in the target translation it should remain same and for that word not translation should happen
as in this below link its describe can it can be achieve in opennmt-tf
1- Add -replace_unk to the translation command, and it will replace the tag with the original word, i.e. it will keep it untranslated.
2- Add -phrase_table to the translation command followed by a dictionary file path to replace the tag with a translation from the file. So the -replace_unk option should be there as well.
The phrase table file should include a single translated word (token) per line in the format: source|||target
Is this above functionality available in Tensorflow version of opennmt-tf. If so how to achieve this in inference
Look for replace_unknown_target in https://opennmt.net/OpenNMT-tf/configuration.html. However, this was mainly useful for word-based translation with RNNs, which has been superseded by subword tokenization and Transformer where this option is no longer relevant.
So what i have done is and please correct me if i gave done some thing new as i am new to this
After loading the data my cod is learning the bpe and then applying the bpe . based on that i am creating the vocab and perform the training but in the training i haven’t used any subword and i am using the below configuration . so please correct me for handling the unk with source word in target
what i have to change in the parameter configuration and what needs to be add as i have already applied the bpe before training so in the parameter what needs to be add
Hope you will answer and thanks for the support in advance