Hello,
I am trying to use the alignment feature (with on the fly tokenization).
I have prepared my alignments using fast_align (in Pharaoh format).
In the YAML file I have configured the below parameters:
data:
train_alignments: train-alignment.txt
params:
guided_alignment_type: ce
guided_alignment_weight: 1
infer:
with_alignments: hard
Then, I implemented guided alignment training.
After training the model, I run the translation command. However the result I get is not accurate at all. In fact, I notice that there are more alignments than expected (meaning that not all alignments correspond to existing tokens).
Do you possibly know what is happening and how can I solve this?
Thank you.