I am new to Machine Translation, working on the Europarl dataset for English - French MT.
I applied Moses+BPE for preprocessing data, later the Transformer model as suggested in OpenNMT documentation.
I am getting words (mentioned in the title) 's or " which seems like HTML or XML Tags (Please correct if I am wrong). These were created by Moses tokenizer.
Should I just remove them after translation using de-tokenizer.
Or should I use a HTML parser to remove them, before training?