I’m on a machine with 64G of RAM.
First, I had an other process using 16G of RAM, and got a kind of OutOfMemory error in the “Preparing” step, while preparing the larger file.
I killed the other process. Now, the train process seems to be idle in the “Preparing” step, on the larger file : no CPU usage, very low RAM usage, no I/O usage, no more text sent to the LOG, but still running.
[09/11/17 15:33:27 INFO] Parsing train data from directory '/home/lm-dev8/mmt_2017-07-05/DATA/train_FREN':
[09/11/17 15:33:27 INFO] * [2] Reading files 'OpenOffice.fr-en.' - 31902 sentences
[09/11/17 15:33:28 INFO] * [4] Reading files 'LM_TRANSPORT.fr-en.' - 211199 sentences
[09/11/17 15:33:28 INFO] * [4] Reading files 'LM_COMPUTING.fr-en.' - 193617 sentences
[09/11/17 15:33:28 INFO] * [4] Reading files 'KDE4.fr-en.' - 180709 sentences
[09/11/17 15:33:28 INFO] * [4] Reading files 'LM_COOKING.fr-en.' - 74148 sentences
[09/11/17 15:33:28 INFO] * [4] Reading files 'Gnome.fr-en.' - 55391 sentences
[09/11/17 15:33:28 INFO] * [4] Reading files 'LM_MANAGEMENT.fr-en.' - 103575 sentences
[09/11/17 15:33:30 INFO] * [2] Reading files 'Wikipedia.fr-en.' - 803670 sentences
[09/11/17 15:33:30 INFO] * [2] Reading files 'Ubuntu.fr-en.' - 9314 sentences
[09/11/17 15:33:30 INFO] * [2] Reading files 'EMEA.fr-en.' - 373152 sentences
[09/11/17 15:33:30 INFO] * [2] Reading files 'PHP.fr-en.' - 16020 sentences
[09/11/17 15:33:31 INFO] * [2] Reading files 'ECB.fr-en.' - 195949 sentences
[09/11/17 15:33:33 INFO] * [1] Reading files 'DGT.fr-en.' - 1987655 sentences
[09/11/17 15:33:34 INFO] * [4] Reading files 'europarl.fr-en.' - 2007723 sentences
[09/11/17 15:34:02 INFO] * [3] Reading files 'MultiUN.fr-en.' - 10480212 sentences
[09/11/17 15:34:02 INFO] 16724236 sentences, in 15 files, in train directory
...
[09/11/17 15:34:04 INFO] --- Preparing train sample
[09/11/17 15:34:23 INFO] * [4] file 'OpenOffice.fr-en.': 31902 total, 3816 drawn, 3779 kept - unknown words: source = 31.1%, target = 20.6%
[09/11/17 15:34:24 INFO] * [4] file 'LM_TRANSPORT.fr-en.': 211199 total, 25257 drawn, 25161 kept - unknown words: source = 18.7%, target = 10.1%
[09/11/17 15:34:28 INFO] * [1] file 'LM_COMPUTING.fr-en.': 193617 total, 23155 drawn, 22086 kept - unknown words: source = 34.9%, target = 27.3%
[09/11/17 15:34:28 INFO] * [3] file 'KDE4.fr-en.': 180709 total, 21611 drawn, 20638 kept - unknown words: source = 45.5%, target = 30.8%
[09/11/17 15:34:29 INFO] * [2] file 'LM_COOKING.fr-en.': 74148 total, 8868 drawn, 8699 kept - unknown words: source = 43.5%, target = 22.4%
[09/11/17 15:34:31 INFO] * [2] file 'Ubuntu.fr-en.': 9314 total, 1114 drawn, 1108 kept - unknown words: source = 50.9%, target = 30.6%
[09/11/17 15:34:33 INFO] * [1] file 'LM_MANAGEMENT.fr-en.': 103575 total, 12387 drawn, 11582 kept - unknown words: source = 17.7%, target = 15.5%
[09/11/17 15:34:33 INFO] * [4] file 'Gnome.fr-en.': 55391 total, 6625 drawn, 6360 kept - unknown words: source = 38.7%, target = 28.0%
[09/11/17 15:34:33 INFO] * [1] file 'PHP.fr-en.': 16020 total, 1916 drawn, 1815 kept - unknown words: source = 28.4%, target = 18.7%
[09/11/17 15:34:34 INFO] * [2] file 'EMEA.fr-en.': 373152 total, 44625 drawn, 42271 kept - unknown words: source = 32.9%, target = 23.2%
[09/11/17 15:34:36 INFO] * [4] file 'ECB.fr-en.': 195949 total, 23433 drawn, 18530 kept - unknown words: source = 25.0%, target = 19.9%
[09/11/17 15:34:42 INFO] * [3] file 'Wikipedia.fr-en.': 803670 total, 96109 drawn, 90160 kept - unknown words: source = 26.8%, target = 24.2%
[09/11/17 15:35:28 INFO] * [1] file 'DGT.fr-en.': 1987655 total, 237698 drawn, 202591 kept - unknown words: source = 20.9%, target = 18.9%
[09/11/17 15:35:36 INFO] * [2] file 'europarl.fr-en.': 2007723 total, 240098 drawn, 209622 kept - unknown words: source = 6.4%, target = 20.5%
It’s now 16:25… without new LOG line…