Hi,
I was trying to reproduce the experiment on CNNDM.
I ws trying to follow the instruction in docs/source/examples/summary/Summarization.md.
I downloaded the dataset from the link provided in the instruction.
When I ran the command
onmt_build_vocab -config cnndm.yaml -n_sample -1, it outputs warnings like
[2023-05-06 18:12:58,051 WARNING] Empty line  in cnndm#40501.
[2023-05-06 18:12:58,689 WARNING] Empty line  in cnndm#41440.
[2023-05-06 18:13:00,229 WARNING] Empty line  in cnndm#43685.
[2023-05-06 18:13:01,120 WARNING] Empty line  in cnndm#45040.
[2023-05-06 18:13:01,361 WARNING] Empty line  in cnndm#45369.
[2023-05-06 18:13:04,510 WARNING] Empty line  in cnndm#49937.
[2023-05-06 18:13:07,184 WARNING] Empty line  in cnndm#53760.
[2023-05-06 18:13:07,808 WARNING] Empty line  in cnndm#54642.
Is this normal? I checked the actual training data and there is no empty line tho.
Should I ignore the warning and keep build the vocab?
Thanks.