I am currently a bachelor’s student, trying my hands on NLP and MT.
I need some guidance with the necessary steps to train an NMT Model. As per my knowledge, these are the steps to train an NMT model
- Tokenization both Source and Target language.
- Applying BPE on tokenized data.
- Applying to preprocess step on BPE data (Wordembeddings, etc.).
- Training required model (LSTM or Transformer)
- Decoding BPE and Translating.
Please correct me if I am wrong.
Another question I have is, Do I need to detokenize the translated data??
Many thanks in advance for the help.
Is there any blog that can guide me with the NMT Pipeline process??