I am trying to replicate im2text OpenNMT-py v1.0.0.rc1 using my own dataset (having images, and respective latex equations). I followed the exact steps as I did for 100K dataset provided in the documentation. The 100K dataset produced a BLEU score of ~87 while mine dataset produced ~0.72.
I made sure that my dataset is formatted in the exact fashion as of 100K dataset. What I have found is there are keyword in most of my predicted equations, which leads to a hypothesis that Vocab file I am using is not correct (I am using same Vocab.txt file provided in 100K dataset). I have also found that the “demo.vocab.pt” file is way different and filled that that of 100K dataset. may I request you to suggest what should I do? Also Can anyone please explain me how this “demo.vocab.pt” file is made? I am having problem in deciphering “Pytorch field” code used for this purpose.
Thanks in advance!