Indigenous languages: first try with Palikúr - looking for advices

Hi everyone,

I live in French Guyana, where it exists a various number of indigenous languages. I’m looking to empowering those cultures by helping them to translate their languages. I did the first try with the Palikur language following this basic tutorial: https://hackernoon.com/neural-machine-translation-using-open-nmt-for-training-a-translation-model-1129a3a2a2d3 and running it on a google cloud computer.

I have some texts translated Palikur-French, Palikur-Portuguese and also some Palikur-English. I’ve more than 12.000 lines. I can easily have the double of it, but I guess it’s not enough, do you know how many lines I need for a BLEU score of 40-50? (I have 3,5 for the moment for Palikur-French) To see how much money I may need to raise to collect sentences.

Because this tutorial was bilingual I’ve used google translate on my Portuguese sentences (that’s bad I know), but I guess I could use a model, maybe already trained for French-Portuguese-English and try to add it Palikur?

Can you recommend some lectures/texts/tutorials to try to build the first prototype? It’s a non-profit and open-source project. The main goal is to build bridges between unknown languages and known ones.

Hi Dorian, I built a Tagalog-English model with 120K sentence pairs and get reasonable translations of simple sentences. I don’t think 12K sentence pairs (or even double that) will be enough. You will need to find some volunteers to do some translations for you. Good luck!

2 Likes

There are Bible translations for a great many indigenous languages, and these are often used for training translators.

This can cause a phenomenon where the indigenous language is translated into English of 300 or 500 years ago. It is important to use modern language English & Portuguese Bibles that can be aligned against the indigenous language versions.

1 Like