Large perplexity

I got a small perplexity at the training of 175523 sentences.
However, I got a large perplexity at the training of 353978 sentences.
When there are illegal characters in the data, will I get large perplexity?

th train.lua -gpuid 1 -data data/demo-train.t7 -save_model demo-model -save_every 10000

[11/06/18 06:13:24 INFO] Using GPU(s): 1
[11/06/18 06:13:24 INFO] Training Sequence to Sequence with Attention model…
[11/06/18 06:13:24 INFO] Loading data from ‘data/demo-train.t7’…
[11/06/18 06:13:38 INFO] * vocabulary size: source = 58229; target = 57576
[11/06/18 06:13:38 INFO] * additional features: source = 0; target = 0
[11/06/18 06:13:38 INFO] * maximum sequence length: source = 50; target = 51
[11/06/18 06:13:38 INFO] * number of training sentences: 353978
[11/06/18 06:13:38 INFO] * number of batches: 4975
[11/06/18 06:13:38 INFO] - source sequence lengths: equal
[11/06/18 06:13:38 INFO] - maximum size: 160 sentences / 1800 tokens
[11/06/18 06:13:38 INFO] - average size: 71.15
[11/06/18 06:13:38 INFO] - capacity: 100.00%
[11/06/18 06:13:38 INFO] Building model…
[11/06/18 06:13:38 INFO] * Encoder:
[11/06/18 06:13:43 INFO] - word embeddings size: 500
[11/06/18 06:13:43 INFO] - type: unidirectional RNN
[11/06/18 06:13:43 INFO] - structure: cell = LSTM; layers = 2; rnn_size = 500; dropout = 0.3 (naive)
[11/06/18 06:13:43 INFO] * Decoder:
[11/06/18 06:13:45 INFO] - word embeddings size: 500
[11/06/18 06:13:45 INFO] - attention: global (general)
[11/06/18 06:13:45 INFO] - structure: cell = LSTM; layers = 2; rnn_size = 500; dropout = 0.3 (naive)
[11/06/18 06:13:46 INFO] * Bridge: copy
[11/06/18 06:13:48 INFO] Initializing parameters…
[11/06/18 06:13:51 INFO] * number of parameters: 96514076
[11/06/18 06:13:51 INFO] Preparing memory optimization…
[11/06/18 06:13:51 INFO] * sharing 69% of output/gradInput tensors memory between clones
[11/06/18 06:13:51 INFO] Preallocating memory
[11/06/18 06:13:59 INFO] Start training from epoch 1 to 13…"
“[11/06/18 06:14:42 INFO] Epoch 1 ; Iteration 50/4975 ; Optim SGD LR 1.000000 ; Source tokens/s 2011 ; Perplexity 124101.17
[11/06/18 06:15:24 INFO] Epoch 1 ; Iteration 100/4975 ; Optim SGD LR 1.000000 ; Source tokens/s 2053 ; Perplexity 30814.61
[11/06/18 06:16:05 INFO] Epoch 1 ; Iteration 150/4975 ; Optim SGD LR 1.000000 ; Source tokens/s 2097 ; Perplexity 5820.47

・”
“[11/06/18 07:22:57 INFO] Epoch 1 ; Iteration 4950/4975 ; Optim SGD LR 1.000000 ; Source tokens/s 2045 ; Perplexity 2.20
[11/06/18 07:23:19 INFO] Epoch 1 ; Iteration 4975/4975 ; Optim SGD LR 1.000000 ; Source tokens/s 2053 ; Perplexity 2.19
[11/06/18 07:23:19 INFO] Evaluating on the validation dataset…
[11/06/18 07:23:41 INFO] Validation perplexity: 2.30
[11/06/18 07:23:41 INFO] Saving checkpoint to ‘demo-model_epoch1_2.30.t7’…
[11/06/18 07:23:43 INFO]
[11/06/18 07:24:26 INFO] Epoch 2 ; Iteration 50/4975 ; Optim SGD LR 1.000000 ; Source tokens/s 2017 ; Perplexity 2.18
[11/06/18 07:25:09 INFO] Epoch 2 ; Iteration 100/4975 ; Optim SGD LR 1.000000 ; Source tokens/s 1965 ; Perplexity 2.24
[11/06/18 07:25:52 INFO] Epoch 2 ; Iteration 150/4975 ; Optim SGD LR 1.000000 ; Source tokens/s 2060 ; Perplexity 2.24


[11/06/18 08:04:14 INFO] Epoch 2 ; Iteration 2900/4975 ; Optim SGD LR 1.000000 ; Source tokens/s 2054 ; Perplexity 3.97
[11/06/18 08:04:56 INFO] Epoch 2 ; Iteration 2950/4975 ; Optim SGD LR 1.000000 ; Source tokens/s 2110 ; Perplexity 3.92
[11/06/18 08:05:35 INFO] Epoch 2 ; Iteration 3000/4975 ; Optim SGD LR 1.000000 ; Source tokens/s 2153 ; Perplexity 4.44
[11/06/18 08:06:14 INFO] Epoch 2 ; Iteration 3050/4975 ; Optim SGD LR 1.000000 ; Source tokens/s 2211 ; Perplexity 4.41
[11/06/18 08:06:56 INFO] Epoch 2 ; Iteration 3100/4975 ; Optim SGD LR 1.000000 ; Source tokens/s 2086 ; Perplexity 5.79
[11/06/18 08:07:37 INFO] Epoch 2 ; Iteration 3150/4975 ; Optim SGD LR 1.000000 ; Source tokens/s 2139 ; Perplexity 5.08
[11/06/18 08:08:19 INFO] Epoch 2 ; Iteration 3200/4975 ; Optim SGD LR 1.000000 ; Source tokens/s 2083 ; Perplexity 7.80
[11/06/18 08:09:04 INFO] Epoch 2 ; Iteration 3250/4975 ; Optim SGD LR 1.000000 ; Source tokens/s 1969 ; Perplexity 18.34
[11/06/18 08:09:45 INFO] Epoch 2 ; Iteration 3300/4975 ; Optim SGD LR 1.000000 ; Source tokens/s 2123 ; Perplexity 16.58
[11/06/18 08:10:28 INFO] Epoch 2 ; Iteration 3350/4975 ; Optim SGD LR 1.000000 ; Source tokens/s 2055 ; Perplexity 66.23
[11/06/18 08:11:09 INFO] Epoch 2 ; Iteration 3400/4975 ; Optim SGD LR 1.000000 ; Source tokens/s 2120 ; Perplexity 196.71
[11/06/18 08:11:49 INFO] Epoch 2 ; Iteration 3450/4975 ; Optim SGD LR 1.000000 ; Source tokens/s 2179 ; Perplexity 517.97
[11/06/18 08:12:32 INFO] Epoch 2 ; Iteration 3500/4975 ; Optim SGD LR 1.000000 ; Source tokens/s 2029 ; Perplexity 1592.08
[11/06/18 08:13:15 INFO] Epoch 2 ; Iteration 3550/4975 ; Optim SGD LR 1.000000 ; Source tokens/s 2051 ; Perplexity 5019.31
[11/06/18 08:13:57 INFO] Epoch 2 ; Iteration 3600/4975 ; Optim SGD LR 1.000000 ; Source tokens/s 2102 ; Perplexity 2413.03
[11/06/18 08:14:40 INFO] Epoch 2 ; Iteration 3650/4975 ; Optim SGD LR 1.000000 ; Source tokens/s 2013 ; Perplexity 2906.25
[11/06/18 08:15:21 INFO] Epoch 2 ; Iteration 3700/4975 ; Optim SGD LR 1.000000 ; Source tokens/s 2116 ; Perplexity 4174.21
[11/06/18 08:16:01 INFO] Epoch 2 ; Iteration 3750/4975 ; Optim SGD LR 1.000000 ; Source tokens/s 2075 ; Perplexity 19779.39
[11/06/18 08:16:42 INFO] Epoch 2 ; Iteration 3800/4975 ; Optim SGD LR 1.000000 ; Source tokens/s 2133 ; Perplexity 184239.71
[11/06/18 08:17:25 INFO] Epoch 2 ; Iteration 3850/4975 ; Optim SGD LR 1.000000 ; Source tokens/s 2049 ; Perplexity 98011.05
[11/06/18 08:18:08 INFO] Epoch 2 ; Iteration 3900/4975 ; Optim SGD LR 1.000000 ; Source tokens/s 2077 ; Perplexity 24580.07
[11/06/18 08:18:52 INFO] Epoch 2 ; Iteration 3950/4975 ; Optim SGD LR 1.000000 ; Source tokens/s 2008 ; Perplexity 44337.66
[11/06/18 08:19:34 INFO] Epoch 2 ; Iteration 4000/4975 ; Optim SGD LR 1.000000 ; Source tokens/s 2069 ; Perplexity 446725.89
[11/06/18 08:20:16 INFO] Epoch 2 ; Iteration 4050/4975 ; Optim SGD LR 1.000000 ; Source tokens/s 2098 ; Perplexity 1155432891693.05
[11/06/18 08:20:54 INFO] Epoch 2 ; Iteration 4100/4975 ; Optim SGD LR 1.000000 ; Source tokens/s 2246 ; Perplexity 17246.49
[11/06/18 08:21:36 INFO] Epoch 2 ; Iteration 4150/4975 ; Optim SGD LR 1.000000 ; Source tokens/s 2123 ; Perplexity 8107.29


[11/06/18 08:33:08 INFO] Evaluating on the validation dataset…
[11/06/18 08:33:30 INFO] Validation perplexity: 5823.04
[11/06/18 08:33:30 INFO] Saving checkpoint to ‘demo-model_epoch2_5823.04.t7’…”

I have (file 0) 1,000 k record file.
Test 1: head -200,000 file 0 > file 1 -> I got a small perplexity
Test 2: head -210,000 file 0 > file 2 -> I got a large perplexity
Test 3: head -200,000 file 0 > file 3; head -10,000 file 0 >> file 3 -> I got a large perplexity

I do not know if the cause of the problem is (a) or (b).
(a) The training data exceeds 200,000.
(b) Illegal character exists in the training data.
I can not find illegal characters in training data.

Please help me on how to get the small perplexity.
(a): 175523 sentences, epoch=13 -> perplexity=1.41 -> translate: Somewhat good
(b): 353978 sentences, epoch=50 -> perplexity=14.65 -> translate: Very bad

th preprocess.lua -src_vocab_size 400000 -tgt_vocab_size 400000 -train_src data/src-train-sentence-ja-wakati.txt -train_tgt data/tgt-train-sentence-ja-wakati.txt -valid_src data/src-val-sentence-ja-wakati.txt -valid_tgt data/tgt-val-sentence-ja-wakati.txt -save_data data/demo

[11/05/18 07:27:42 INFO] Using on-the-fly ‘space’ tokenization for input 1
[11/05/18 07:27:42 INFO] Using on-the-fly ‘space’ tokenization for input 2
[11/05/18 07:27:43 INFO] Preparing vocabulary…
[11/05/18 07:27:43 INFO] * Building source vocabularies…
[11/05/18 07:27:48 INFO] * Created word dictionary of size 41798 (pruned from 41798)
[11/05/18 07:27:48 INFO]
[11/05/18 07:27:48 INFO] * Building target vocabularies…
[11/05/18 07:27:53 INFO] * Created word dictionary of size 41379 (pruned from 41379)
[11/05/18 07:27:53 INFO]
[11/05/18 07:27:53 INFO] Preparing training data…
[11/05/18 07:27:53 INFO] — Preparing train sample
[11/05/18 07:28:08 INFO] * [-] file ‘data/src-train-sentence-ja-wakati.txt’ (): 210000 total, 210000 drawn, 175523 kept - unknown words: source = 0.0%, target = 0.0%
[11/05/18 07:28:08 INFO] … shuffling sentences
[11/05/18 07:28:09 INFO] … sorting sentences by size
[11/05/18 07:28:10 INFO] Prepared 175523 sentences:
[11/05/18 07:28:10 INFO] * 34477 sequences not validated (length, other)
[11/05/18 07:28:10 INFO] * average sequence length: source = 24.7, target = 24.7
[11/05/18 07:28:10 INFO] * source sentence length (range of 10): [ 12% ; 18% ; 24% ; 17% ; 10% ; 5% ; 3% ; 1% ; 1% ; 5% ]
[11/05/18 07:28:10 INFO] * target sentence length (range of 10): [ 12% ; 18% ; 24% ; 17% ; 10% ; 5% ; 3% ; 1% ; 1% ; 5% ]
[11/05/18 07:28:10 INFO]
[11/05/18 07:28:10 INFO]
[11/05/18 07:28:10 INFO] Preparing validation data…
[11/05/18 07:28:10 INFO] — Preparing valid sample
[11/05/18 07:28:11 INFO] * [-] file ‘data/src-val-sentence-ja-wakati.txt’ (): 6000 total, 6000 drawn, 4701 kept - unknown words: source = 1.1%, target = 1.0%
[11/05/18 07:28:11 INFO] … shuffling sentences
[11/05/18 07:28:11 INFO] … sorting sentences by size
[11/05/18 07:28:11 INFO] Prepared 4701 sentences:
[11/05/18 07:28:11 INFO] * 1299 sequences not validated (length, other)
[11/05/18 07:28:11 INFO] * average sequence length: source = 27.5, target = 27.5
[11/05/18 07:28:11 INFO] * source sentence length (range of 10): [ 7% ; 13% ; 24% ; 19% ; 12% ; 7% ; 4% ; 2% ; 1% ; 7% ]
[11/05/18 07:28:11 INFO] * target sentence length (range of 10): [ 7% ; 13% ; 24% ; 19% ; 12% ; 7% ; 3% ; 2% ; 1% ; 7% ]
[11/05/18 07:28:11 INFO]
[11/05/18 07:28:11 INFO]
[11/05/18 07:28:11 INFO] Saving source vocabulary to ‘data/demo.src.dict’…
[11/05/18 07:28:11 INFO] Saving target vocabulary to ‘data/demo.tgt.dict’…
[11/05/18 07:28:11 INFO] Saving data to ‘data/demo-train.t7’…

th train.lua -gpuid 1 -data data/demo-train.t7 -save_model demo-model -save_every 10000

-rw-r–r-- 1 root root 666033249 Nov 5 07:58 demo-model_epoch1_2.80.t7
-rw-r–r-- 1 root root 666033249 Nov 5 08:27 demo-model_epoch2_2.30.t7
-rw-r–r-- 1 root root 666033249 Nov 5 08:56 demo-model_epoch3_2.32.t7
-rw-r–r-- 1 root root 666033249 Nov 5 09:25 demo-model_epoch4_1.81.t7
-rw-r–r-- 1 root root 666033249 Nov 5 09:54 demo-model_epoch5_1.62.t7
-rw-r–r-- 1 root root 666033249 Nov 5 10:24 demo-model_epoch6_1.53.t7
-rw-r–r-- 1 root root 666033249 Nov 5 10:53 demo-model_epoch7_1.49.t7
-rw-r–r-- 1 root root 666033249 Nov 5 11:22 demo-model_epoch8_1.45.t7
-rw-r–r-- 1 root root 666033249 Nov 5 11:51 demo-model_epoch9_1.44.t7
-rw-r–r-- 1 root root 666033249 Nov 5 12:20 demo-model_epoch10_1.43.t7
-rw-r–r-- 1 root root 666033249 Nov 5 12:49 demo-model_epoch11_1.42.t7
-rw-r–r-- 1 root root 666033249 Nov 5 13:18 demo-model_epoch12_1.41.t7
-rw-r–r-- 1 root root 666033249 Nov 5 13:47 demo-model_epoch13_1.41.t7

th preprocess.lua -src_vocab_size 400000 -tgt_vocab_size 400000 -train_src data/src-train-sentence-ja-wakati.txt -train_tgt data/tgt-train-sentence-ja-wakati.txt -valid_src data/src-val-sentence-ja-wakati.txt -valid_tgt data/tgt-val-sentence-ja-wakati.txt -save_data data/demo

“[11/08/18 08:40:02 INFO] Using on-the-fly ‘space’ tokenization for input 1
[11/08/18 08:40:02 INFO] Using on-the-fly ‘space’ tokenization for input 2
[11/08/18 08:40:03 INFO] Preparing vocabulary…
[11/08/18 08:40:03 INFO] * Building source vocabularies…
[11/08/18 08:40:13 INFO] * Created word dictionary of size 58229 (pruned from 58229)
[11/08/18 08:40:13 INFO]
[11/08/18 08:40:13 INFO] * Building target vocabularies…
[11/08/18 08:40:23 INFO] * Created word dictionary of size 57576 (pruned from 57576)
[11/08/18 08:40:23 INFO]
[11/08/18 08:40:23 INFO] Preparing training data…
[11/08/18 08:40:23 INFO] — Preparing train sample
[11/08/18 08:40:51 INFO] * [-] file ‘data/src-train-sentence-ja-wakati.txt’ (): 420000 total, 420000 drawn, 353978 kept - unknown words: source = 0.0%, target = 0.0%
[11/08/18 08:40:51 INFO] … shuffling sentences
[11/08/18 08:40:53 INFO] … sorting sentences by size
[11/08/18 08:40:55 INFO] Prepared 353978 sentences:
[11/08/18 08:40:55 INFO] * 66022 sequences not validated (length, other)
[11/08/18 08:40:55 INFO] * average sequence length: source = 24.5, target = 24.5
[11/08/18 08:40:55 INFO] * source sentence length (range of 10): [ 12% ; 18% ; 24% ; 17% ; 10% ; 5% ; 2% ; 1% ; 1% ; 5% ]
[11/08/18 08:40:55 INFO] * target sentence length (range of 10): [ 12% ; 18% ; 24% ; 17% ; 10% ; 5% ; 2% ; 1% ; 1% ; 5% ]
[11/08/18 08:40:55 INFO]
[11/08/18 08:40:55 INFO]
[11/08/18 08:40:55 INFO] Preparing validation data…
[11/08/18 08:40:55 INFO] — Preparing valid sample
[11/08/18 08:40:56 INFO] * [-] file ‘data/src-val-sentence-ja-wakati.txt’ (): 6000 total, 6000 drawn, 4741 kept - unknown words: source = 0.8%, target = 0.7%
[11/08/18 08:40:56 INFO] … shuffling sentences
[11/08/18 08:40:56 INFO] … sorting sentences by size
[11/08/18 08:40:56 INFO] Prepared 4741 sentences:
[11/08/18 08:40:56 INFO] * 1259 sequences not validated (length, other)
[11/08/18 08:40:56 INFO] * average sequence length: source = 27.3, target = 27.3
[11/08/18 08:40:56 INFO] * source sentence length (range of 10): [ 7% ; 13% ; 25% ; 19% ; 12% ; 7% ; 3% ; 2% ; 1% ; 7% ]
[11/08/18 08:40:56 INFO] * target sentence length (range of 10): [ 7% ; 13% ; 25% ; 19% ; 12% ; 7% ; 3% ; 2% ; 1% ; 7% ]
[11/08/18 08:40:56 INFO]
[11/08/18 08:40:56 INFO]
[11/08/18 08:40:56 INFO] Saving source vocabulary to ‘data/demo.src.dict’…
[11/08/18 08:40:56 INFO] Saving target vocabulary to ‘data/demo.tgt.dict’…
[11/08/18 08:40:56 INFO] Saving data to ‘data/demo-train.t7’…”

th train.lua -gpuid 1 -data data/demo-train.t7 -save_model demo-model -save_every 10000 -end_epoch 50 -log_file log.txt

-rw-r–r-- 1 root root 895950863 Nov 8 09:49 demo-model_epoch1_2.30.t7
-rw-r–r-- 1 root root 895950863 Nov 8 10:57 demo-model_epoch2_5823.04.t7
-rw-r–r-- 1 root root 895950863 Nov 8 12:04 demo-model_epoch3_487.83.t7
-rw-r–r-- 1 root root 895950863 Nov 8 13:11 demo-model_epoch4_1246.04.t7
-rw-r–r-- 1 root root 895950863 Nov 8 14:18 demo-model_epoch5_154.12.t7
-rw-r–r-- 1 root root 895950863 Nov 8 15:26 demo-model_epoch6_104.21.t7
-rw-r–r-- 1 root root 895950863 Nov 8 16:33 demo-model_epoch7_79.58.t7
-rw-r–r-- 1 root root 895950863 Nov 8 17:41 demo-model_epoch8_55.90.t7
-rw-r–r-- 1 root root 895950863 Nov 8 18:48 demo-model_epoch9_44.96.t7
-rw-r–r-- 1 root root 895950863 Nov 8 19:56 demo-model_epoch10_36.76.t7
-rw-r–r-- 1 root root 895950863 Nov 8 21:04 demo-model_epoch11_30.79.t7
-rw-r–r-- 1 root root 895950863 Nov 8 22:12 demo-model_epoch12_26.57.t7
-rw-r–r-- 1 root root 895950863 Nov 8 23:19 demo-model_epoch13_22.62.t7
-rw-r–r-- 1 root root 895950863 Nov 9 00:27 demo-model_epoch14_20.53.t7
-rw-r–r-- 1 root root 895950863 Nov 9 01:35 demo-model_epoch15_18.55.t7
-rw-r–r-- 1 root root 895950863 Nov 9 02:42 demo-model_epoch16_17.27.t7
-rw-r–r-- 1 root root 895950863 Nov 9 03:50 demo-model_epoch17_16.63.t7
-rw-r–r-- 1 root root 895950863 Nov 9 04:57 demo-model_epoch18_15.95.t7
-rw-r–r-- 1 root root 895950863 Nov 9 06:04 demo-model_epoch19_15.63.t7
-rw-r–r-- 1 root root 895950863 Nov 9 07:12 demo-model_epoch20_15.48.t7
-rw-r–r-- 1 root root 895950863 Nov 9 08:19 demo-model_epoch21_15.09.t7
-rw-r–r-- 1 root root 895950863 Nov 9 09:26 demo-model_epoch22_14.98.t7
-rw-r–r-- 1 root root 895950863 Nov 9 10:34 demo-model_epoch23_14.92.t7
-rw-r–r-- 1 root root 895950863 Nov 9 11:41 demo-model_epoch24_14.79.t7
-rw-r–r-- 1 root root 895950863 Nov 9 12:49 demo-model_epoch25_14.77.t7
-rw-r–r-- 1 root root 895950863 Nov 9 13:56 demo-model_epoch26_14.74.t7
-rw-r–r-- 1 root root 895950863 Nov 9 15:04 demo-model_epoch27_14.74.t7
-rw-r–r-- 1 root root 895950863 Nov 9 16:11 demo-model_epoch28_14.70.t7
-rw-r–r-- 1 root root 895950863 Nov 9 17:18 demo-model_epoch29_14.66.t7
-rw-r–r-- 1 root root 895950863 Nov 9 18:26 demo-model_epoch30_14.65.t7
-rw-r–r-- 1 root root 895950863 Nov 9 19:34 demo-model_epoch31_14.66.t7
-rw-r–r-- 1 root root 895950863 Nov 9 20:41 demo-model_epoch32_14.65.t7
-rw-r–r-- 1 root root 895950863 Nov 9 21:49 demo-model_epoch33_14.65.t7
-rw-r–r-- 1 root root 895950863 Nov 9 22:56 demo-model_epoch34_14.65.t7
-rw-r–r-- 1 root root 895950863 Nov 10 00:04 demo-model_epoch35_14.65.t7
-rw-r–r-- 1 root root 895950863 Nov 10 01:11 demo-model_epoch36_14.65.t7
-rw-r–r-- 1 root root 895950863 Nov 10 02:19 demo-model_epoch37_14.65.t7
-rw-r–r-- 1 root root 895950863 Nov 10 03:26 demo-model_epoch38_14.65.t7
-rw-r–r-- 1 root root 895950863 Nov 10 04:33 demo-model_epoch39_14.65.t7
-rw-r–r-- 1 root root 895950863 Nov 10 05:41 demo-model_epoch40_14.65.t7
-rw-r–r-- 1 root root 895950863 Nov 10 06:48 demo-model_epoch41_14.65.t7
-rw-r–r-- 1 root root 895950863 Nov 10 07:55 demo-model_epoch42_14.65.t7
-rw-r–r-- 1 root root 895950863 Nov 10 09:03 demo-model_epoch43_14.65.t7
-rw-r–r-- 1 root root 895950863 Nov 10 10:10 demo-model_epoch44_14.65.t7
-rw-r–r-- 1 root root 895950863 Nov 10 11:17 demo-model_epoch45_14.65.t7
-rw-r–r-- 1 root root 895950863 Nov 10 12:25 demo-model_epoch46_14.65.t7
-rw-r–r-- 1 root root 895950863 Nov 10 13:32 demo-model_epoch47_14.65.t7
-rw-r–r-- 1 root root 895950863 Nov 10 14:40 demo-model_epoch48_14.65.t7
-rw-r–r-- 1 root root 895950863 Nov 10 15:47 demo-model_epoch49_14.65.t7
-rw-r–r-- 1 root root 895950863 Nov 10 16:55 demo-model_epoch50_14.65.t7

How did you tokenize your training file?

I am trying to translate from Japanese to Japanese.
I used mecab-ipadic-neologd to tokenize.

Hello!

I would like to use case_feature for Japanese sentences.
Is there a rule for “|character” of case_feature ?
I found │C, │L, │N, │U, │M in the sample data.
Can I use another “|character” ?

Yes. You can use any strings as feature, not only characters.