Indigenous languages: first try with Palikúr - looking for advices

Hi everyone,

I live in French Guyana, where it exists a various number of indigenous languages. I’m looking to empowering those cultures by helping them to translate their languages. I did the first try with the Palikur language following this basic tutorial: and running it on a google cloud computer.

I have some texts translated Palikur-French, Palikur-Portuguese and also some Palikur-English. I’ve more than 12.000 lines. I can easily have the double of it, but I guess it’s not enough, do you know how many lines I need for a BLEU score of 40-50? (I have 3,5 for the moment for Palikur-French) To see how much money I may need to raise to collect sentences.

Because this tutorial was bilingual I’ve used google translate on my Portuguese sentences (that’s bad I know), but I guess I could use a model, maybe already trained for French-Portuguese-English and try to add it Palikur?

Can you recommend some lectures/texts/tutorials to try to build the first prototype? It’s a non-profit and open-source project. The main goal is to build bridges between unknown languages and known ones.

Hi Dorian, I built a Tagalog-English model with 120K sentence pairs and get reasonable translations of simple sentences. I don’t think 12K sentence pairs (or even double that) will be enough. You will need to find some volunteers to do some translations for you. Good luck!


There are Bible translations for a great many indigenous languages, and these are often used for training translators.

This can cause a phenomenon where the indigenous language is translated into English of 300 or 500 years ago. It is important to use modern language English & Portuguese Bibles that can be aligned against the indigenous language versions.

