OpenNMT Forum

Words become numbers

I understand that the first thing the neural MT program does is to turn words into numbers.

ex: First sentence of the database:
I see a red door
I = 001
see = 002
a = 003
red = 004
door = 005

je vois une porte rouge
je = 006
vois = 007
une = 008
porte = 009
rouge = 010

A sort of dictionary words-numbers is built that way. If a word is not in the dictionary, a new entry is created (’‘opened’’, ‘‘the’’ and ‘‘large’’ below).

Second sentence of the database
I open the big door
I = 001
open = 011
the = 012
big = 013
door = 005

j’ = 014
ouvre = 015
la = 016
grande = 017
porte = 009

The program therefore creates a sort of two column database, dictionary: Words and numbers. A word always has the same number on the database.

Am I on the right track?

3 posts were merged into an existing topic: Explaining the concept