Text Summarization on Gigaword and ROUGE Scoring

Do I need to have PyTorch for running tihs? How to install the package that was used in the python code ?

Yes, you need PyTorch. You can install it following the instructions from its website http://pytorch.org/

Is it possible for you to share the link of the Model textsum_acc_51.38_ppl_12.59_e13.pt so that I can test. Actually, 50 Hours is too much and hence, I request you to share a link from where I can download the model file. This will help a lot. thank you in advance

Here is the model: textsum_acc_51.38_ppl_12.59_e13.pt

1 Like

This is awesome Twang
 :slight_smile: Delighted by your reply and thank you for the model file.

The model is trained on a gpu, I guess. I currently do not have a gpu. So how is it feasible to use it with CPU. kindly, enlighten me.

No problem. Have you tried removing -gpu?

If just removing -gpu does not work please have a look here: http://opennmt.net/OpenNMT/translation/inference/

1 Like

Hi pltrdy, thanks for this helpful tutorial! May I ask what is this dataset that you linked in your psot? Is it a sub-sampled Gigaword data?

It is the Gigaword dataset as used in Rush et al (2015).

Thanks. But I assume it is not the whole Gigaword dataset right?

Indeed.


(from Rush et al)

1 Like

Hi,

when I try to use file2rouge to test the rouge score, it caused the error:

  File "files2rouge/files2rouge.py", line 251
    print(*args, **kwargs, file=saveto)
                         ^
SyntaxError: invalid syntax

I have python 2.7.13 within Anaconda 4.4.0.
Could you please have a look? Thanks.

Indeed. I just pushed new commits in both pythonrouge and files2rouge.
You must then pull & run setup.py again for both.

My python 2.7 files2rouge now works.

I am using the pre-trained model given by twang. I am facing issues in the implementation of the model.

My Ubuntu 16.04 server doesn’t have a GPU.

Issue:

$ python translate.py -model textsum_acc_51.38_ppl_12.59_e13.pt -src 
/sumdata/Giga/input.txt

Traceback (most recent call last):

    ImportError: No module named Dict

I installed a library “dict” (i couldn’t find any library named “Dict”) and couldn’t solve the problem.

python version : 2.7.12

torch versions:
torch==0.2.0.post3
torchtext==0.2.0a0

I am new to python and not able to crack this. Any leads on this?

I must say I never used OpenNMT-py with Python 2.7.

I would first recommend to try using python 3.x.

Since now there is a Tensorflow wrapper: OpenNMT-tf: a new alternative
It would be interesting to provide a tutorial using this alternative version.

Hi all,

Thanks for helpful instructions. I trained he model using a gpu, it works fine on sample data in Giga folder, although for any other articles that are try it doesn’t generate any output or just a word or two. Any suggestion?

I’m trying these two short articles:

british intelligence sources report that the group of approximately five somali pirates who have captured the mv Tanya off the somalian coast call themselves the waterways protection regional guard
sources confirmed that diamonds were shipped from yemen to moscow by georgiy giunter on december.

Output: protection is regional guard protection regional guard

giunter is a dealer in jewelry and precious stones who does business in the middle east and russia. giunter is a money launderer in addition to his legitimate gemstone work.

Output: sent from yemen to moscow

i tried to generate translation with OpenNMT-py using @twang pre-trained model:

python3 translate.py -model textsum_acc_51.38_ppl_12.59_e13.pt -src ../data/bitcoin-tosum.txt

However i got this result:

um_acc_51.38_ppl_12.59_e13.pt -src ../data/bitcoin-tosum.txt

Traceback (most recent call last):
File “translate.py”, line 116, in
main()
File “translate.py”, line 39, in main
onmt.ModelConstructor.load_test_model(opt, dummy_opt.dict)
File “/Users/ifadardin/Documents/Python/OpenNMT-py-master/onmt/ModelConstructor.py”, line 114, in load_test_model
map_location=lambda storage, loc: storage)
File “/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/torch/serialization.py”, line 261, in load
return _load(f, map_location, pickle_module)
File “/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/torch/serialization.py”, line 409, in _load
result = unpickler.load()
ImportError: No module named ‘onmt.Dict’

I checked in GitHub it might be because onmt.Dict is eliminated on last summer update. Is there any work around here?

@SinaMohseni It’s not always easy to debug model’s behavior. Your case may be related to https://github.com/OpenNMT/OpenNMT-py/issues/457 i.e. we sometime need to force a minimum output size otherwise it stops too early. If it does not help I would suggest you to open an issue.


@Ifad As you said, it is occuring because the model has been trained with another OpenNMT-py version. It make sense to open an issue for this.