OpenNMT Forum

Quick Start Tutorial Google Collab

I have included both text and code blocks in this Collab, very fast way to see file and console outputs without installing anything on local machine.

This is interesting. If this is interfaced with the corpus in Tab-delimited Bilingual Sentence Pairs from the Tatoeba Project (Good for Anki and Similar Flashcard Applications) (manythings.org) there can be good models formed for languages.

Hello @Plkmoi

I’m not very experienced in deep learning, just using the library for a month :grin:

This link seems to be very useful, but the amount of sentences might not be adequate for many languages.

http://www.manythings.org/bilingual/

English - Russian (421K)
English - Italian (345K)
English - German (227K)

As I know from the previous forum topics, at least 1M sentences are needed for decent translations. But I will try with the same colab, some preprocessing is needed (wget the zip file, create two different files from tab separated values etc)

Which language pair you want me to try first from this corpus?

Thank you.

English and Berber as this would be interesting as there is Latin script along with Tifinagh script which has much different letters. https://www.manythings.org/anki/ber-eng.zip. In Language index - Tatoeba there are 542,769 Kabyle sentences and Kabyle is a variant of Berber and 382,839 Berber sentences.

131357 sentences, may be it could give some results. I will try that in a different colab then.