I wanted to experiment the impact of fix_word_vecs options… since, without it, the vectorisation is supposed to change the input/output topology while the network is searching for its convergence (right ?).
I first launched a standard training, without fix_word_vecs options, with this quite normal evolution log (I just used a learning rate > 1, that seems to be ok in this case):
> Epoch 1 ; Iteration 50/41568 ; Learning rate 1.5000 ; Source tokens/s 1204 ; Perplexity 68118751647.70
> Epoch 1 ; Iteration 100/41568 ; Learning rate 1.5000 ; Source tokens/s 1732 ; Perplexity 41221337814.30
> ...
> Epoch 1 ; Iteration 41500/41568 ; Learning rate 1.5000 ; Source tokens/s 3469 ; Perplexity 13.21
> Epoch 1 ; Iteration 41550/41568 ; Learning rate 1.5000 ; Source tokens/s 3469 ; Perplexity 13.20
> Validation perplexity: 8.543593651398
> Epoch 2 ; Iteration 50/41568 ; Learning rate 1.4250 ; Source tokens/s 3378 ; Perplexity 6.47
> ...
> Epoch 2 ; Iteration 41550/41568 ; Learning rate 1.4250 ; Source tokens/s 3484 ; Perplexity 6.09
> Validation perplexity: 7.1898761906535
> Epoch 3 ; Iteration 41550/41568 ; Learning rate 1.3537 ; Source tokens/s 3486 ; Perplexity 5.39
> Validation perplexity: 6.5590207265228
> Epoch 4 ; Iteration 41550/41568 ; Learning rate 1.2861 ; Source tokens/s 3487 ; Perplexity 5.02
> Validation perplexity: 6.2865323023127
> Epoch 5 ; Iteration 41550/41568 ; Learning rate 1.2218 ; Source tokens/s 3417 ; Perplexity 4.77
> Validation perplexity: 6.1607023175256
> Epoch 6 ; Iteration 41550/41568 ; Learning rate 1.1607 ; Source tokens/s 3489 ; Perplexity 4.59
> Validation perplexity: 6.1120954671075
> Epoch 7 ; Iteration 41550/41568 ; Learning rate 1.1026 ; Source tokens/s 3490 ; Perplexity 4.44
> Validation perplexity: 5.9609019604981
> Epoch 8 ; Iteration 41550/41568 ; Learning rate 1.0475 ; Source tokens/s 3496 ; Perplexity 4.32
> Validation perplexity: 5.8425269160052
> Epoch 9 ; Iteration 41550/41568 ; Learning rate 0.9951 ; Source tokens/s 3499 ; Perplexity 4.21
> Validation perplexity: 5.8363649325191
> Epoch 10 ; Iteration 41550/41568 ; Learning rate 0.9454 ; Source tokens/s 3500 ; Perplexity 4.12
> Validation perplexity: 5.8436381053213
> Epoch 11 ; Iteration 41550/41568 ; Learning rate 0.8981 ; Source tokens/s 3500 ; Perplexity 4.03
> Validation perplexity: 5.8997290808584
> Epoch 12 ; Iteration 41550/41568 ; Learning rate 0.8532 ; Source tokens/s 3501 ; Perplexity 3.96
> Validation perplexity: 5.8911250442504
> Epoch 13 ; Iteration 41550/41568 ; Learning rate 0.8105 ; Source tokens/s 3500 ; Perplexity 3.89
> Validation perplexity: 5.7969499523339
> Epoch 14 ; Iteration 41550/41568 ; Learning rate 0.7700 ; Source tokens/s 3464 ; Perplexity 3.83
> Validation perplexity: 6.0838147564159
Then, I took the model stored at epoch 8, and re-launched it with exactly the same parameters (of course with also a learning rate > 1), but with both fix_word_vecs options. In this case, the learning seems to diverge !?
> Epoch 9 ; Iteration 50/41568 ; Learning rate 1.5000 ; Source tokens/s 1227 ; Perplexity 4.34
> Epoch 9 ; Iteration 100/41568 ; Learning rate 1.5000 ; Source tokens/s 1744 ; Perplexity 4.37
> ...
> Epoch 9 ; Iteration 41550/41568 ; Learning rate 1.5000 ; Source tokens/s 3487 ; Perplexity 4.75
> Validation perplexity: 6.5090732511643
> Epoch 10 ; Iteration 41550/41568 ; Learning rate 1.4250 ; Source tokens/s 3500 ; Perplexity 4.67
> Validation perplexity: 6.281285318457
> Epoch 11 ; Iteration 41550/41568 ; Learning rate 1.3537 ; Source tokens/s 3492 ; Perplexity 57074.22
> Validation perplexity: 40601884600.174
> Epoch 12 ; Iteration 41550/41568 ; Learning rate 1.2861 ; Source tokens/s 3478 ; Perplexity 72631876.57
> Validation perplexity: 1558787159.9703
??
PS : relaunched with learning rate at 1, to see if this is changing something…