Hello Everyone ,
Since I’m not too experienced in neural machine translation (NMT), I’ve been investigating several frameworks and tools to aid in the development of bespoke translation models. Because of OpenNMT’s extensive feature set and vibrant community, I have selected it.
I’m translating specialised technical engineering documents as part of a project I’m working on. There is a good deal of domain-specific vocabulary and intricate sentence construction in these publications.
Here are some details regarding my project and the difficulties I’m having:
Preparing the Dataset: Although it is not very vast, I have a bilingual collection of technical documents.
Which techniques work best for adding to this dataset so that the translations are more accurate? Are there any helpful pre-processing processes or suggested data augmentation techniques?
Model Selection: Should I use the normal transformer model given the technical content of the documents, or are there any particular model modifications or configurations that are ideal for technical text translations?
Training Process: Which essential hyperparameters should I concentrate on adjusting to improve technical document translation accuracy? Are there any recommendations for setting these hyperparameters’ starting values?
Evaluation Metrics: In addition to the BLEU score, were there any additional metrics which are particularly helpful in evaluating how well a translation model performs when applied to technical texts?
Post-processing and Fine-Tuning: What are the most effective methods for adjusting the original model with fresh data after it has been trained? Are there any afterwards methods that can aid enhance the translated output in addition to this?
I followed this https://www.atltranslate.com/blog/8-best-practices-technical-documentation-translation-sap-sac
Thank you in advance.