Dear Robert,

If this is your ultimate goal, then as Vincent said, this is not the right forum, and OpenNMT is not the right tool; plus, NMT is not the way to go for a regular dictionary.

What to do? Simply, search for something like a “dictionary API”. There are a plenty of APIs (free and paid) that can help you create a dictionary by simply extracting the data you need from a JSON output.

The problem is some people think that they are going to search for Neural Machine Translation or NMT and find basic answers about NMT per se. The truth is Neural Machine Translation is just an application of Deep Neural Networks, which is an advancement of Logistic Regression. So even though you can find a simplifying article like this one (“Making Sense of Neural Machine Translation”), the truth is it might still require some **background**.

Logistic Regression seems a big term, but it is simple. Let’s take an example of a Binary Classification problem; let’s imagine we have a big database of million people, of both Diabetes patients and healthy people, which includes columns like “age”, “cholesterol”, “blood pressure”, “smoking”, “parent with Diabetes” etc. and finally whether this person has Diabetes or not. Can we have some mathematical/statistical analysis that tells us the probability that someone with specific features has or will soon have Diabetes?

Of course, we can. However, the problem that we will have is that not all features might have the same value (or weight); so we have to figure out the right weight of each feature. For example, maybe “age” is important, but not as important as “cholesterol” or as whether the person has a “parent with Diabetes”. We cannot just guess. So we have to calculate the weight of each feature as well. How to do this? We already have the right answers in the database, so we try to apply different weights on the different features until the output guessed by our model is very similar to what is actually in the database.

How to do this? If one gets such a question at high school or college, they will try to remember some equation they studied, meaning we cannot just do this without studying these equations and formulas.

As I said a Neural Network is an advancement of Logistic Regression by going deeper and making more calculations to get more accurate results, and Neural Machine Translation is in fact a multi-class classification (compared to the binary-class classification explained above). Natural Language Processing in general adds something called Word Embeddings, which is simply about relations between words in a dataset.

Neural Machine Translation is very similar to Auto-Complete; if I have initial three words, what is the word expected next? However, Neural Machine Translation takes into consideration the source sentence as well. So with a very big dataset, right weights, relations between words, sophisticated probability calculations, powerful hardware, and enough time, can I get a good prediction of the next word and then the next word, etc. until I complete the target sentence? Of course, I can.

I hope I did not over-simplify things, but if you need to get into the technical explanation, I recommend you start with courses from IBM about Machine Learning and Deep Learning (both of which are so good for beginners), and then move to courses from DeepLearning.ai, all of which are on *Coursera*. In addition, the article above is so good. Also, you might want to have a look at Statistical Machine Translation.

All the best!

Yasmin