Explaining the concept

I want to mention that at this point in time a lof of people have tried to explain in layman’s terms how Neural Machine Translation works and they all failed except maybe ONE GENTLEMAN. But then again his explanation is too complex.

Here is his work: https://towardsdatascience.com/neural-machine-translation-15ecf6b0b

So this not quite IT…

All the others gently ‘‘skip’’ the explanation because they obviously cannot understand it :slight_smile: (in written or on YouTube).

Given this context, I am sure that if a serious and gifted writer would write a book or an article to explain the concept, he or she would be hit the (recognition) jackpot.

I believe that the best way to proceed would be to explain what happens when:

  1. 40 millions of sentences are pumped into the databases
  2. A sentence is entered to be translated.
  3. You want to correct a translation mistake

It will be very interesting to read the person’s work.

Thank you all for your attention.

You probably want to search for more general subjects like machine learning or neural networks. The concepts behind neural machine translation are the same as image recognition for example.

Tell me something Guillaume. In fact all you do is inject a few million translated sentences into a NMT databases and the program will do all the rest?

From the user perspective that is basically what happens. However, the program exposes several options to customize its behavior: how the data is prepared, what model is trained, what is the training strategy, etc.

Are you looking for more information on how models are trained or how they are executed to produce translations?

Guillaume, honestly what I’d like to know first of all is how things work from the time the computer puts all the words in numbers and constitutes itself a sort of dictionary with two columns: Words and numbers. Afterwards, we’ll see.

I understand that the first thing the neural MT program does is to turn words into numbers.

ex: First sentence of the database:
I see a red door
I = 001
see = 002
a = 003
red = 004
door = 005

je vois une porte rouge
je = 006
vois = 007
une = 008
porte = 009
rouge = 010

A sort of dictionary words-numbers is built that way. If a word is not in the dictionary, a new entry is created (’‘opened’’, ‘‘the’’ and ‘‘large’’ below).

Second sentence of the database
I open the big door
I = 001
open = 011
the = 012
big = 013
door = 005

j’ = 014
ouvre = 015
la = 016
grande = 017
porte = 009

The program therefore creates a sort of two column database, dictionary: Words and numbers. A word always has the same number on the database.

Am I on the right track?

Yes, you get a mapping from token to IDs. Usually the vocabulary size is defined in NMT across subwords. For further reading have a look at byte pair encoding and sentencepiece or word embeddings.

Let’s recap. We have a 300 thousand word-number dictionary So what’s next?

In short:

Each source number (or index) is assigned to a vector of floating point values. This vector is then fed into the neural network which applies a series of mathematical transformation to output a probability distribution over the next target number (or index) to produce. We select the index with highest probability as the word to produce and repeat this process to generate the complete target sequence.

During training, the transformations are iteratively tuned to maximize the probability of generating the correct target sequence given a source sequence.

Can you give examples, please, Guillaume?

I understand no one is going to come up with a simple explanation to indicate how the NMT works.

OK so I am interested in building a French-English dictionary.
I have to download the program app or buy it?

Is the app specific to the French-English pair or it is not?
Is the app open-code?

@Robert you are definitely asking questions on the wrong forum.
OpenNMT is a community of experienced users and researchers or people who are ready to invest a lot of time to understand the basics and go deeper in this area.

Asking about a building dictionary here shows that you need to do some readings and home work. OpenNMT is not coursera nor an hotline for any kind of questions.

Please refrain from asking these questions and read some blog post about machine learning in general and NLP more specifically.


To tell you the truth I have given up all hope of understanding how NMT works for it seems no one really knows how it works so all I get is the run-around. :slight_smile: This NMT is definitely a strange concept. This is the first time no one can explain a concept. Oh well, maybe one day a Martian will come around, will understand the concept and will explain it to us. :slight_smile:

I’d be really surprised to hear that this thing works the way it is presented. :slight_smile: How can you do translation with numbers, statistics and absolutely no grammar rules?

If we follow your logic -translation by numbers- all we’d get is surroundings of a word.

Ex: ‘‘door’’ (300) comes with a the (100) usually. Can you do translation with such info? NO!

Dear Robert,

If this is your ultimate goal, then as Vincent said, this is not the right forum, and OpenNMT is not the right tool; plus, NMT is not the way to go for a regular dictionary.

What to do? Simply, search for something like a “dictionary API”. There are a plenty of APIs (free and paid) that can help you create a dictionary by simply extracting the data you need from a JSON output.

The problem is some people think that they are going to search for Neural Machine Translation or NMT and find basic answers about NMT per se. The truth is Neural Machine Translation is just an application of Deep Neural Networks, which is an advancement of Logistic Regression. So even though you can find a simplifying article like this one (“Making Sense of Neural Machine Translation”), the truth is it might still require some background.

Logistic Regression seems a big term, but it is simple. Let’s take an example of a Binary Classification problem; let’s imagine we have a big database of million people, of both Diabetes patients and healthy people, which includes columns like “age”, “cholesterol”, “blood pressure”, “smoking”, “parent with Diabetes” etc. and finally whether this person has Diabetes or not. Can we have some mathematical/statistical analysis that tells us the probability that someone with specific features has or will soon have Diabetes?

Of course, we can. However, the problem that we will have is that not all features might have the same value (or weight); so we have to figure out the right weight of each feature. For example, maybe “age” is important, but not as important as “cholesterol” or as whether the person has a “parent with Diabetes”. We cannot just guess. So we have to calculate the weight of each feature as well. How to do this? We already have the right answers in the database, so we try to apply different weights on the different features until the output guessed by our model is very similar to what is actually in the database.

How to do this? If one gets such a question at high school or college, they will try to remember some equation they studied, meaning we cannot just do this without studying these equations and formulas.

As I said a Neural Network is an advancement of Logistic Regression by going deeper and making more calculations to get more accurate results, and Neural Machine Translation is in fact a multi-class classification (compared to the binary-class classification explained above). Natural Language Processing in general adds something called Word Embeddings, which is simply about relations between words in a dataset.

Neural Machine Translation is very similar to Auto-Complete; if I have initial three words, what is the word expected next? However, Neural Machine Translation takes into consideration the source sentence as well. So with a very big dataset, right weights, relations between words, sophisticated probability calculations, powerful hardware, and enough time, can I get a good prediction of the next word and then the next word, etc. until I complete the target sentence? Of course, I can.

I hope I did not over-simplify things, but if you need to get into the technical explanation, I recommend you start with courses from IBM about Machine Learning and Deep Learning (both of which are so good for beginners), and then move to courses from DeepLearning.ai, all of which are on Coursera. In addition, the article above is so good. Also, you might want to have a look at Statistical Machine Translation.

All the best!