Text Summarization on Gigaword and ROUGE Scoring

codehead · April 28, 2017, 6:28pm

Do I need to have PyTorch for running tihs? How to install the package that was used in the python code ?

twang · April 28, 2017, 6:49pm

Yes, you need PyTorch. You can install it following the instructions from its website http://pytorch.org/

JafferWilson · June 7, 2017, 4:09am

Is it possible for you to share the link of the Model textsum_acc_51.38_ppl_12.59_e13.pt so that I can test. Actually, 50 Hours is too much and hence, I request you to share a link from where I can download the model file. This will help a lot. thank you in advance

twang · June 9, 2017, 9:37pm

Here is the model: textsum_acc_51.38_ppl_12.59_e13.pt

JafferWilson · June 10, 2017, 1:16am

This is awesome Twang… Delighted by your reply and thank you for the model file.

JafferWilson · June 10, 2017, 1:47am

The model is trained on a gpu, I guess. I currently do not have a gpu. So how is it feasible to use it with CPU. kindly, enlighten me.

twang · June 10, 2017, 3:33am

No problem. Have you tried removing -gpu?

pltrdy · June 13, 2017, 10:12am

If just removing -gpu does not work please have a look here: http://opennmt.net/OpenNMT/translation/inference/

bo123 · June 29, 2017, 5:19pm

Hi pltrdy, thanks for this helpful tutorial! May I ask what is this dataset that you linked in your psot? Is it a sub-sampled Gigaword data?

pltrdy · July 3, 2017, 3:22pm

It is the Gigaword dataset as used in Rush et al (2015).

bo123 · July 3, 2017, 4:02pm

Thanks. But I assume it is not the whole Gigaword dataset right?

pltrdy · July 3, 2017, 4:09pm

Indeed.

(from Rush et al)

lijun_wu · August 30, 2017, 2:18am

Hi,

when I try to use file2rouge to test the rouge score, it caused the error:

  File "files2rouge/files2rouge.py", line 251
    print(*args, **kwargs, file=saveto)
                         ^
SyntaxError: invalid syntax

I have python 2.7.13 within Anaconda 4.4.0.
Could you please have a look? Thanks.

pltrdy · August 30, 2017, 8:46am

Indeed. I just pushed new commits in both pythonrouge and files2rouge.
You must then pull & run setup.py again for both.

My python 2.7 files2rouge now works.

vikash · August 31, 2017, 9:34am

I am using the pre-trained model given by twang. I am facing issues in the implementation of the model.

My Ubuntu 16.04 server doesn’t have a GPU.

Issue:

$ python translate.py -model textsum_acc_51.38_ppl_12.59_e13.pt -src …/sumdata/Giga/input.txt

Traceback (most recent call last):

    ImportError: No module named Dict

I installed a library “dict” (i couldn’t find any library named “Dict”) and couldn’t solve the problem.

python version : 2.7.12

torch versions:
torch==0.2.0.post3
torchtext==0.2.0a0

I am new to python and not able to crack this. Any leads on this?

pltrdy · September 1, 2017, 3:27pm

I must say I never used OpenNMT-py with Python 2.7.

I would first recommend to try using python 3.x.

loretoparisi · December 1, 2017, 8:52am

Since now there is a Tensorflow wrapper: OpenNMT-tf: a new alternative
It would be interesting to provide a tutorial using this alternative version.

SinaMohseni · December 19, 2017, 8:33pm

Hi all,

Thanks for helpful instructions. I trained he model using a gpu, it works fine on sample data in Giga folder, although for any other articles that are try it doesn’t generate any output or just a word or two. Any suggestion?

I’m trying these two short articles:

british intelligence sources report that the group of approximately five somali pirates who have captured the mv Tanya off the somalian coast call themselves the waterways protection regional guard
sources confirmed that diamonds were shipped from yemen to moscow by georgiy giunter on december.

Output: protection is regional guard protection regional guard

giunter is a dealer in jewelry and precious stones who does business in the middle east and russia. giunter is a money launderer in addition to his legitimate gemstone work.

Output: sent from yemen to moscow

Ifad · December 31, 2017, 10:38am

i tried to generate translation with OpenNMT-py using @twang pre-trained model:

python3 translate.py -model textsum_acc_51.38_ppl_12.59_e13.pt -src ../data/bitcoin-tosum.txt

However i got this result:

um_acc_51.38_ppl_12.59_e13.pt -src ../data/bitcoin-tosum.txt
Traceback (most recent call last):
File “translate.py”, line 116, in
main()
File “translate.py”, line 39, in main
onmt.ModelConstructor.load_test_model(opt, dummy_opt.dict)
File “/Users/ifadardin/Documents/Python/OpenNMT-py-master/onmt/ModelConstructor.py”, line 114, in load_test_model
map_location=lambda storage, loc: storage)
File “/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/torch/serialization.py”, line 261, in load
return _load(f, map_location, pickle_module)
File “/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/torch/serialization.py”, line 409, in _load
result = unpickler.load()
ImportError: No module named ‘onmt.Dict’

I checked in GitHub it might be because onmt.Dict is eliminated on last summer update. Is there any work around here?

pltrdy · January 2, 2018, 9:09am

@SinaMohseni It’s not always easy to debug model’s behavior. Your case may be related to https://github.com/OpenNMT/OpenNMT-py/issues/457 i.e. we sometime need to force a minimum output size otherwise it stops too early. If it does not help I would suggest you to open an issue.

@Ifad As you said, it is occuring because the model has been trained with another OpenNMT-py version. It make sense to open an issue for this.