English to Finnish Translation

saloni.jain · May 22, 2019, 2:19pm

Hi,

I have been working on the Finnish to English Translation project and have used OpenNMT-tf in Linux.
Initally with small dataset the accuracy of the model was pretty good, like around 60-70 %.
But with increase in data we found the accuracy to fall drastically like to 16-17% and also many words and numbers were missing in the target file.
The dataset used had a sentence per line and was tokenized. And we used the same code from the Github available for the German to English translation. No features were modified.

Can you suggest us why the model failed, and also whether the code from Github could be directly used to translate finning to english text or any language pair it be?

Thanks,
Saloni Jain
Begineer in Machine Learning Field

guillaumekln · May 23, 2019, 7:56am

Hi,

Can you give more details? The dataset size, the preprocessing, the vocabulary size, etc.

Are you referring to these scripts?

saloni.jain · May 23, 2019, 2:32pm

Hi,
Thanks for writing back.
Sorry for the mistake but we have been using the OpenNMT-py model.
So the scripts we have been using were from : https://github.com/OpenNMT/OpenNMT-
py?files=1

 And initially we used 2k lines of data for training, which gave us an accuracy of 70%. Later 
 when trained with 2lacs data the accuracy turned out to be 16%.

The default vocabulary size was used for the source and target files i.e 50k.

And part of preprocessing we have just tokenized the data based on delimiter and then 
preprocessed it further using the  commands given in OpenNMT.

Let me know if you need any further information on this.
Just hoping to find some solution and develop a better model.

Thanks,
Saloni Jain

 The preprocessing part of the data included tokenzing it based on spaces and having a sentence per line.

And the Vocabulary size was

tel34 · May 23, 2019, 2:56pm

Our customers have been very pleased with the Dutch-English translations made by a Transformer model (TensorFlow) trained with data first processed with SentencePiece. Have you looked at that?

saloni.jain · May 24, 2019, 5:59am

Hi,

I haven’t seen any such material yet. Could you please share the link with me?

Thanks,
Saloni Jain

tel34 · May 24, 2019, 7:41am

Everything you need to know to get started is here: https://github.com/google/sentencepiece
Good luck

saloni.jain · May 24, 2019, 11:13am

Thanks,
I shall go through it.