OpenNMT

Linguee Open Source?

Hey,

I was wondering if anyone knows an open-source tool similar to Linguee. My understanding is that Linguee automatically generated a dictionary with a web crawler going over bilingual texts.

I am asking here since I imagine the technology is related to OpenNMT or that even some of the underlying techniques are the same, and that people here would know.

Thanks very much,
Julius

Dear Julius,

I think what you are referring to is a word aligner.

  1. You use one of these tools, fast_align, eflomal, or efmaral, to generate word alignment in the Pharaoh format.
  2. You use a Python script like this one here to generate phrases. Note that alignment takes the Pharaoh alignment, but you have to convert it first to a list of tuples. Also, a small edit, print() should have brackets for Python3, or maybe write to a file.

Here is an example of the input and output you can expect:

srctext = "etroit dans la plupart des pays africains"
trgtext = "narrow in most african countries"
alignment= [(0, 0), (1, 1), (3, 2), (6, 3), (5,4)]   # eflomal

( 1) (0, 1) etroit β€” narrow
( 2) (0, 2) etroit dans β€” narrow in
( 3) (0, 3) etroit dans la β€” narrow in
( 4) (0, 4) etroit dans la plupart β€” narrow in most
( 5) (0, 5) etroit dans la plupart des β€” narrow in most
( 6) (0, 7) etroit dans la plupart des pays africains β€” narrow in most african countries
( 7) (1, 2) dans β€” in
( 8) (1, 3) dans la β€” in
( 9) (1, 4) dans la plupart β€” in most
(10) (1, 5) dans la plupart des β€” in most
(11) (1, 7) dans la plupart des pays africains β€” in most african countries
(12) (2, 4) la plupart β€” most
(13) (2, 5) la plupart des β€” most
(14) (2, 7) la plupart des pays africains β€” most african countries
(15) (3, 4) plupart β€” most
(16) (3, 5) plupart des β€” most
(17) (3, 7) plupart des pays africains β€” most african countries
(18) (4, 6) des pays β€” countries
(19) (4, 7) des pays africains β€” african countries
(20) (5, 6) pays β€” countries
(21) (5, 7) pays africains β€” african countries
(22) (6, 7) africains β€” african

Well, OpenNMT uses Neural Machine Translation; however, Statistical (Phrase-based) Machine Translation depends on similar word aligners (like the famous Giza++).

Note that OpenNMT{py,tf} can still generate these alignments with extra options (see report_align and with_alignments) after training during translation, but specialized word aligners should be more accurate.

I hope this helps!

Kind regards,
Yasmin

1 Like

There is something call sketch-engine available that enable you to do something like Lingue. (there is a free trial), but they also offer β€œno-sketch” which is free, but i’m not sure if this includes that feature…

if you want to have a look:
Sketch Engine