Linguee Open Source?

jukhamil · June 29, 2021, 7:04pm

Hey,

I was wondering if anyone knows an open-source tool similar to Linguee. My understanding is that Linguee automatically generated a dictionary with a web crawler going over bilingual texts.

I am asking here since I imagine the technology is related to OpenNMT or that even some of the underlying techniques are the same, and that people here would know.

Thanks very much,
Julius

ymoslem · June 29, 2021, 11:17pm

Dear Julius,

I think what you are referring to is a word aligner.

You use one of these tools, fast_align, eflomal, or efmaral, to generate word alignment in the Pharaoh format.
You use a Python script like this one here to generate phrases. Note that alignment takes the Pharaoh alignment, but you have to convert it first to a list of tuples. Also, a small edit, print() should have brackets for Python3, or maybe write to a file.

Here is an example of the input and output you can expect:

srctext = "etroit dans la plupart des pays africains"
trgtext = "narrow in most african countries"
alignment= [(0, 0), (1, 1), (3, 2), (6, 3), (5,4)]   # eflomal

( 1) (0, 1) etroit — narrow
( 2) (0, 2) etroit dans — narrow in
( 3) (0, 3) etroit dans la — narrow in
( 4) (0, 4) etroit dans la plupart — narrow in most
( 5) (0, 5) etroit dans la plupart des — narrow in most
( 6) (0, 7) etroit dans la plupart des pays africains — narrow in most african countries
( 7) (1, 2) dans — in
( 8) (1, 3) dans la — in
( 9) (1, 4) dans la plupart — in most
(10) (1, 5) dans la plupart des — in most
(11) (1, 7) dans la plupart des pays africains — in most african countries
(12) (2, 4) la plupart — most
(13) (2, 5) la plupart des — most
(14) (2, 7) la plupart des pays africains — most african countries
(15) (3, 4) plupart — most
(16) (3, 5) plupart des — most
(17) (3, 7) plupart des pays africains — most african countries
(18) (4, 6) des pays — countries
(19) (4, 7) des pays africains — african countries
(20) (5, 6) pays — countries
(21) (5, 7) pays africains — african countries
(22) (6, 7) africains — african

Well, OpenNMT uses Neural Machine Translation; however, Statistical (Phrase-based) Machine Translation depends on similar word aligners (like the famous Giza++).

Note that OpenNMT{py,tf} can still generate these alignments with extra options (see report_align and with_alignments) after training during translation, but specialized word aligners should be more accurate.

I hope this helps!

Kind regards,
Yasmin

SamuelLacombe · June 30, 2021, 2:26pm

There is something call sketch-engine available that enable you to do something like Lingue. (there is a free trial), but they also offer “no-sketch” which is free, but i’m not sure if this includes that feature…

if you want to have a look:
Sketch Engine