(OpenNMT-py version here)
Motivations
- Replicate results for Text Summarization task on Gigaword (see βAboutβ)
- Getting started with Text Summarization using
OpenNMT
(src) - Getting started with ROUGE scoring using
files2rouge
(src)
About
- Reference: http://opennmt.net//Models/#english-summarization
- Dataset: https://github.com/harvardnlp/sent-summary
- Expected results:
- R1: 33.13
- R2: 16.09
- RL: 31.00
- OpenNMT v0.2.0. (precisely using commit from the 4th of Jan., 2017, 561994adcd147f9f77cc744a041152c3182a9300)
-
file2rouge
v0.2
Setup
git clone https://github.com/OpenNMT/OpenNMT.git opennmt
git clone --recursive https://github.com/pltrdy/files2rouge.git files2rouge
Download data from here and extract (tar -xzf summary.tar.gz
) to ./data
.
We assume that your file system is like:
./
opennmt/
data/
file2rouge/
Building model
Following the guide
# First, move to OpenNMT dir
cd opennmt
1) Preprocess
th preprocess.lua -train_src ../data/train/train.article.txt -train_tgt ../data/train/train.title.txt -valid_src ../data/train/valid.article.filter.txt -valid_tgt ../data/train/valid.title.filter.txt -save_data ../data/train/textsum
2) Train
th train.lua -data ../data/train/textsum-train.t7 -save_model textsum
or using GPU:
th train.lua -data ../data/train/textsum-train.t7 -save_model textsum -gpuid 1
3) Generate summary
th translate.lua -model textsum_final.t7 -src ../data/Giga/inputs.txt
(add -gpuid 1
if you trained the model using GPU)
The output will be in pred.txt
ROUGE Scoring using files2rouge
cd ../files2rouge
./files2rouge ../opennmt/pred.txt ../data/Giga/task1_ref0.txt
Results
- ROUGE-1: 34.2
- ROUGE-2: 16.2
- ROUGE-L: 31.9