Once the parallel data is extracted for subword model GitHub - rsennrich/subword-nmt: Unsupervised Word Segmentation for Neural Machine Translation and Text Generation, how to add it the configuration file and build the vocabulary ?
The next thing being how to translate the test data ? should this test data also be translated as subwords and how to restore this segmentation after translation ? ( sed -r 's/(@@ )|(@@ ?$)//g'
)
Please confirm this bpe step