It seems like the problem is related with your lua instalation.
Do you use luajit or lua52?
Can you try to reinstall it with lua52? Maybe that can solve your problem
Thanks! Reinstalling Torch using Lua 5.2 instead of LuaJIT appears to have solved the issue. It might be worth including that in the installation instructions, since they currently simply refer people to the standard Torch installation guide, which defaults to using LuaJIT.
That said, everything I’ve ready about LuaJIT indicates that it is considerably faster than the normal Lua interpreter. Will switching to Lua 5.2 make everything slower? I’m guessing that since the brunt of the work is done on the GPU, it won’t really matter much…
With Lua 5.2 instead of LuaJIT, learn_bpe.lua is painfully slow. I’m going to see if I can change it to use tds.Vec instead of a built-in table, since that should work around the 2GB limit in LuaJIT. If I get it to work, I’ll send a PR; if not, I’ll just open an issue
This is correct. It’s mostly the plain Lua scripts that will be slower. However, you can easily install both Lua versions in parallel and switch as needed.
In general, do you recommend using Lua 5.1 rather than LuaJIT? If so, then that should really be clearly stated in the Installation section of the docs (which should also be updated to reflect that luarocks install bit32 is also required).
Switching from using tables to tds.Hash and tds.Vec solves the 2GB issue in LuaJIT, but the resulting code is just as slow as with Lua 5.2 - roughly an order of magnitude slower than LuaJIT (when it can fit everything in under 2GB).
It seems as though iterating through tds.Hash (either with pairs and ipairs) is not particularly fast. Any ideas about how to make this faster?
Not yet - I’ve been playing around with the code a lot to try to speed it up, and it’s currently quite messy (and not quite functional). I’ll clean it up and push it to a branch on my GitHub fork, and will let you know once it’s available.
Update: I think I may have found some middle ground, changing only some tables to either tds.Vec or tds.Hash (depending on how they’re addressed). For smaller corpora where the original learn_bpe.lua worked OK under LuaJIT, my updated version is only marginally slower, and for larger corpora where LuaJIT crashed, it seems to work OK now.
The key was to keep stats as a table, since the code iterates through it over and over (and iterating over a tds.Hash appears to be very slow).
I’m going to run some more tests and, if all goes well, will push the updated version to a branch on my fork of the repo and will submit a PR.
Done! I’ve tested with various different sets of input data and sizes, and confirmed that it always produces the same results as the old version under Lua 5.1 - except that it runs about 10 times faster (and pretty much as fast as the old version under LuaJIT). I’ve submitted a pull request too