Text Summarization on Gigaword and ROUGE Scoring

pltrdy · January 10, 2017, 2:56pm

(OpenNMT-py version here)

Motivations

Replicate results for Text Summarization task on Gigaword (see ‘About’)
Getting started with Text Summarization using OpenNMT (src)
Getting started with ROUGE scoring using files2rouge (src)

About

Reference: http://opennmt.net//Models/#english-summarization
Dataset: https://github.com/harvardnlp/sent-summary
Expected results:
- R1: 33.13
- R2: 16.09
- RL: 31.00
OpenNMT v0.2.0. (precisely using commit from the 4th of Jan., 2017, 561994adcd147f9f77cc744a041152c3182a9300)
file2rouge v0.2

Setup

git clone https://github.com/OpenNMT/OpenNMT.git opennmt
git clone --recursive https://github.com/pltrdy/files2rouge.git files2rouge

Download data from here and extract (tar -xzf summary.tar.gz) to ./data.

We assume that your file system is like:

./   
  opennmt/   
  data/   
  file2rouge/

Building model

Following the guide

# First, move to OpenNMT dir
cd opennmt

1) Preprocess

th preprocess.lua -train_src ../data/train/train.article.txt -train_tgt ../data/train/train.title.txt -valid_src ../data/train/valid.article.filter.txt -valid_tgt ../data/train/valid.title.filter.txt -save_data ../data/train/textsum

2) Train

th train.lua -data ../data/train/textsum-train.t7  -save_model textsum

or using GPU:

th train.lua -data ../data/train/textsum-train.t7  -save_model textsum -gpuid 1

3) Generate summary

th translate.lua -model textsum_final.t7 -src ../data/Giga/inputs.txt

(add -gpuid 1 if you trained the model using GPU)
The output will be in pred.txt

ROUGE Scoring using `files2rouge`

cd ../files2rouge
./files2rouge ../opennmt/pred.txt ../data/Giga/task1_ref0.txt

Results

ROUGE-1: 34.2
ROUGE-2: 16.2
ROUGE-L: 31.9

srush · January 10, 2017, 4:59pm

(I can confirm that this is roughly what I did for the model posted here http://opennmt.net/Models/ )

marcejohnson · January 31, 2017, 4:30pm

Thank you Paul, really super helpful. It ran the first time!

livenletdie · March 26, 2017, 7:12pm

Thank you for this helpful tutorial on training network for text summarization.

What is the perplexity score of this model on Gigaword benchmark? When I run translate.lua on the model I trained, it seems to be in high 70s. That seems way high – is this expected or something wrong in my setup? The ROGUE score I get is in the range shown above (34,17, 33).

Command I used for computing perplexity: th translate.lua -model textsum_epoch13_11.98.t7 -src …/data/sumdata/Giga/input.txt -tgt …/data/sumdata/Giga/task1_ref0.txt

Thanks
Ganesh

pltrdy · March 27, 2017, 1:04pm

Your perplexity result must be correct if you followed the tutorial and get the same rouge scores

twang · April 22, 2017, 5:33pm

Thank you very much for this helpful tutorial! I follow your instructions and replicate the results using OpenNMT-py. All the steps are nearly the same:

Setup

git clone https://github.com/OpenNMT/OpenNMT-py.git opennmt
git clone --recursive https://github.com/pltrdy/files2rouge.git files2rouge
cd opennmt

Download data, extract the data, and assume we have the same file system:

Preprocess

python preprocess.py -train_src ../data/train/train.article.txt -train_tgt ../data/train/train.title.txt -valid_src ../data/train/valid.article.filter.txt -valid_tgt ../data/train/valid.title.filter.txt -save_data ../data/train/textsum

Train

python train.py -data ../data/train/textsum.train.pt -save_model textsum -gpus 0

In my experiment, the model is trained for 13 epochs on 1 GPU, it takes about 50 hours. textsum_acc_51.38_ppl_12.59_e13.pt is the last saved model.

Generate summary

python translate.py -model textsum_acc_51.38_ppl_12.59_e13.pt -src ../data/Giga/input.txt -gpu 0

Rouge scoring

cd ../files2rouge
python files2rouge.py ../opennmt/pred.txt ../data/Giga/task1_ref0.txt

Results:
ROUGE-1 (F): 0.356620
ROUGE-2 (F): 0.175106
ROUGE-L (F): 0.334049

pltrdy · April 24, 2017, 8:02am

Great! I’ll reference your answer in the tuto. Thx

viratvivek · April 25, 2017, 12:46pm

I am getting this error:
warnings.warn(“Mean of empty slice.”, RuntimeWarning)
/usr/lib/python2.7/dist-packages/numpy/core/_methods.py:70: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
ROUGE-1 (F): nan
ROUGE-2 (F): nan
ROUGE-3 (F): nan
ROUGE-L (F): nan
ROUGE-S4 (F): nan

Also many of the entries in task1_ref0.txt are UNK. I am using pretrained model from here

pltrdy · April 26, 2017, 8:35am

What are you trying to do?

Could you provide the command you ran and the full trace?

viratvivek · April 26, 2017, 8:56am

Everything went fine . TIll generating summary step. Then I ran this from files2rouge,
python files2rouge.py ../opennmt/pred.txt ../data/Giga/task1_ref0.txt and then I got this

Evaluated 0 ref/summary pairs in 15.369 seconds (0.000 lines/sec)
/usr/lib/python2.7/dist-packages/numpy/core/_methods.py:59: RuntimeWarning: Mean of empty slice.
  warnings.warn("Mean of empty slice.", RuntimeWarning)
/usr/lib/python2.7/dist-packages/numpy/core/_methods.py:70: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
ROUGE-1 (F): nan
ROUGE-2 (F): nan
ROUGE-3 (F): nan
ROUGE-L (F): nan
ROUGE-S4 (F): nan

pltrdy · April 26, 2017, 9:10am

You should look at your file closer. There may be only out of vocabulary words because there is 0 ref/summary pairs but it took 15 seconds. Which mean he read something that is in the end nothing.

viratvivek · April 26, 2017, 9:22am

While generating summary I am getting some of the PRED scores as negative
[04/26/17 09:21:39 INFO] PRED 598: arctic ocean gets up to normal
[04/26/17 09:21:39 INFO] PRED SCORE: -7.86
[04/26/17 09:21:39 INFO]
[04/26/17 09:21:39 INFO] SENT 599: nestle sa , the world 's biggest food and drink maker , reported wednesday a # percent fall in first-half net profit as the recession hurt consumer demand and divestments and a stronger swiss franc weighed on sales .
[04/26/17 09:21:39 INFO] PRED 599: nestle first-half net profit falls # percent
[04/26/17 09:21:39 INFO] PRED SCORE: -3.43
[04/26/17 09:21:39 INFO]
[04/26/17 09:21:39 INFO] SENT 600: president barack obama 's campaign for a health care overhaul is an intense installment in a long-running story , dating to theodore roosevelt in #### .
[04/26/17 09:21:39 INFO] PRED 600: obama s health care campaign is a hard sell
[04/26/17 09:21:39 INFO] PRED SCORE: -12.22
[04/26/17 09:21:39 INFO]

is that okay?

pltrdy · April 26, 2017, 9:38am

It is. Just open your …/opennmt/pred.txt & …/data/Giga/task1_ref0.txt to see what’s going on

viratvivek · April 26, 2017, 9:38am

Getting this error on running the rouge script:
File “/home/namas2/files2rouge/pythonrouge/pythonrouge.py”, line 65, in pythonrouge
output = subprocess.check_output([ROUGE_path, “-e”, data_path, “-a”, “-m”, “-2”, “4”,"-n", “3”, abs_xml_path], stderr=subprocess.STDOUT)
File “/usr/lib/python2.7/subprocess.py”, line 574, in check_output
raise CalledProcessError(retcode, cmd, output=output)
CalledProcessError: Command ‘[’/home/namas2/files2rouge/pythonrouge/RELEASE-1.5.5/ROUGE-1.5.5.pl’, ‘-e’, ‘/home/namas2/files2rouge/pythonrouge/RELEASE-1.5.5/data’, ‘-a’, ‘-m’, ‘-2’, ‘4’, ‘-n’, ‘3’, ‘/tmp/tmpOOwkC4/rouge.xml’]’ returned non-zero exit status 2
CalledProcessError: Command ‘[’/home/namas2/files2rouge/pythonrouge/RELEASE-1.5.5/ROUGE-1.5.5.pl’, ‘-e’, ‘/home/namas2/files2rouge/pythonrouge/RELEASE-1.5.5/data’, ‘-a’, ‘-m’, ‘-2’, ‘4’, ‘-n’, ‘3’, ‘/tmp/tmpHIEqoL/rouge.xml’]’ returned non-zero exit status 2
raise CalledProcessError(retcode, cmd, output=output)
CalledProcessError: Command ‘[’/home/namas2/files2rouge/pythonrouge/RELEASE-1.5.5/ROUGE-1.5.5.pl’, ‘-e’, ‘/home/namas2/files2rouge/pythonrouge/RELEASE-1.5.5/data’, ‘-a’, ‘-m’, ‘-2’, ‘4’, ‘-n’, ‘3’, ‘/tmp/tmpOQOorE/rouge.xml’]’ returned non-zero exit status 2

Evaluated 0 ref/summary pairs in 40.412 seconds (0.000 lines/sec)
/usr/lib/python2.7/dist-packages/numpy/core/_methods.py:59: RuntimeWarning: Mean of empty slice.
  warnings.warn("Mean of empty slice.", RuntimeWarning)
/usr/lib/python2.7/dist-packages/numpy/core/_methods.py:70: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
ROUGE-1 (F): nan
ROUGE-2 (F): nan
ROUGE-3 (F): nan
ROUGE-L (F): nan
ROUGE-S4 (F): nan

viratvivek · April 26, 2017, 9:45am

Following are the contents of pred.txt:
former corrupt official executed in sw china
launch of shenzhou-# manned spacecraft successful
interpol asks members to devise rules for security
chinese envoy urges un international community to continue supporting timor-leste
millions of hungry people will wake up hungry
ecuadorian president reiterates support for bolivian president
zimbabwean minister visits china
cambodian parties cancel decision to attend coalition talks
saudi crown prince calls on muslims to move on path of unity
israel launches third missile strike on gaza city
season
thousands attend funeral of palestinians killed in israeli airstrike
comesa to set up comesa common investment area
bulgarian police seize ## kg of heroin
european major stocks end lower
austrian fm to visit china
ballack recovers from ankle injury
rok rok launch container freighters
sri lankan pm to meet bush
pakistan stocks end higher
s. korean stocks continue winning streak
# killed in s. africa road accident
musharraf speaks highly of relations
world cup
nigeria signs new gas deal with oil majors
all judicial files on intellectual property rights to be available on
kenya issues tough warning to politicians
truck drivers and mechanics work for waste management
wuhan wins men 's soccer title
california braces for possible new wildfires
musharraf declares state of emergency in pakistan
uae oil giant announces oil prices
world bank to meet in nepal south asia
london share prices close lower
thai cns chief ignores demand to resign
us senate house to spend ### billion dollars on defense budget
israeli settlers continue construction despite government pledge
border patrol cracks down on illegal migrants
vietnam 's machinery imports up
siemens reports #.# billion euro internal probe
zambia 's biggest mining company to spill # billion dollars
city hopes to host first ever youth olympic games
landslide in northwest china kills #
food poisoning kills six in central china
afghanistan 's daily news
foreign investment in vietnam 's agriculture not enough
russian central bank sees #.#-percent growth in ####
brazilian ronaldinho for training with brazil
taiwanese fishing vessel arrives in kenya
mexico to host mexico 's ochoa
zambia blames overcrowded soccer fans for stampede
tokyo stocks open sharply lower
eriksson signs three thai young players
china 's sun hui claims women 's ##kg at world wushu championships
namibia imports maize from zambia
rescue work completed at australian mine
bangladesh cyclone death toll rises to ####
nigeria gets #th students for world cup
###,### beijingers encouraged to protect historical sites
zoe 's ark arrives in france
## militants killed in s. afghanistan
pakistan fm misses commonwealth meeting
china launches new on-line database
romania kazakhstan to cooperate in transporting caspian oil
vietnam to boost export of heavy industry products
cuban officials censor possible elimination of boxers protective head gear
croatia 's parliamentary elections kick off
##,### people die of smoking annually in vietnam

moderate quake rocks manila
chinese vice-premier stresses service trade

pla commander ends sweden visit
australian pm announces new cabinet
china honors mauritanian independence celebrations
marshall islands opposition declares victory
congo launches campaign against aids
roddick gives u.s. #-# lead in davis cup final
u.s. construction spending drops #.# percent in february
vietnamese pm calls for monitoring food safety
asian swimming record falls again
zimbabwe 's ruling party requests electoral fraud
lebanese speaker shocked at french foreign minister
brian cowen elected irish governing fianna fail leader
bangladesh india to resume train service
beijing olympic organizers criticized for sabotage
china 's chery automobile reports ## percent growth in exports
oman makes preparation for beijing olympic torch relay
s pore shares close higher
china 's economic growth slows to ##.# percent in first quarter
white house says carter hamas meeting not useful
nine killed in philippine bus crash

pltrdy · April 26, 2017, 10:03am

The problem is then related to files2rouge.

It must be caused by blank line in your text. You can quickfix by removing empty lines, then I’ll patch files2rouge as this is not a good behavior.

viratvivek · April 26, 2017, 11:02am

Can you please upload your version of files2rouge?I removed the newlines(blank lines) but still same error.

RaghuS · April 26, 2017, 11:34am

I am also getting same error. May be it is because of < unk > created in pred.txt . Could you please upload correctly generated pred.txt ??

pltrdy · April 26, 2017, 11:59am

You can find it here.

The prediction file is the one I generated for the tutorial. I ran the scoring again using the last commit of files2rouge.

RaghuS · April 27, 2017, 9:09am

It worked .
The problem was that libxml-parser-perl was not installed.
The result I got is as follows:
ROUGE-1 (F): 0.342369
ROUGE-2 (F): 0.162145
ROUGE-3 (F): 0.088410
ROUGE-L (F): 0.319768
ROUGE-S4 (F): 0.146097
Thanks for your help !!

Text Summarization on Gigaword and ROUGE Scoring

Motivations

About

Setup

Building model

ROUGE Scoring using files2rouge

Results

ROUGE Scoring using `files2rouge`