Build a Glossary


I was wondering if anyone experienced any way to generate a glossary with OpenNMT?

My goal is to generate a Glossary in order to manually fix it and then use it for training. I’m welcome to suggestion if there are better ways to do that out there!

Did you try fast_align? GitHub - clab/fast_align: Simple, fast unsupervised word aligner

1 Like

I have not, but I will give it a shot :slight_smile: thanks!

I have a Wiktionary scraping script for dictionary data.

Thank you, but I need to build a glossary or custom dictionary based on my own data :stuck_out_tongue:

But I keep in mind your script. It could be handy in some situation!

1 Like

Hello Terence,

Thanks for your suggestion. I’ve been playing around with fast_align and I’m really happy with the results!

I’m thinking to leverage it a little more by using a custom train model and use the evaluate aspect of it to get the predict score from the model on each aligned piece generated from fast-align. I wonder if you ever tried that? I’m hoping to get even better results with this kind of filtering.

Best regards,

1 Like

Hello Samuel, Glad you found fast_align useful, and you certainly got further than I did. I had intended to do some experiments but got side-tracked by other things :slight_smile: