Why Lua/Torch? (Please don't hate me for this question.)

performance
pytorch

(Devin G Bost) #1

I’m really hoping that I’m not going to irritate anyone for this post, but I genuinely want to know the answer (not to attack anyone, but to better understand the initial vision for this project).

I’m just really curious about why the Lua-based Torch framework was chosen for the architecture of OpenNMT. I’ve encountered some resistance to adopting Lua in one of the teams I’ve been working with, so I’m looking for supporting evidence (or perhaps reasons to look into OpenNMT-py instead) behind why the original decision was made to invest the future of OpenNMT in Torch when Lua (from what I’ve been told) is becoming decreasingly popular (though I don’t know if it’s true or not, especially since Google is making Luarocks a Google Summer-Of-Code project) for new research. I did also find this comment from an AI researcher at Facebook (answering in response to the question from the original poster): Roadmap for Torch and PyTorch.

I recognize that Lua is fast, simple, and lightweight, but I also recognize that it doesn’t have quite the same breadth or depth of industry support as other languages (such as Python), which (from my understanding) limits its usability for overall software architecture, particularly for high-scale, high-performance, highly-available, highly-complex, elastic cloud-based web applications.
Here’s the comparison between the two languages on Google Trends: Google Trends comparison: Lua programming vs Python programming

Also, Torch doesn’t run well on Windows, though I recognize that a lot of people don’t care about running machine learning software on Windows at all.

So, why was Lua/Torch chosen as the primary platform/architecture for this project, rather than starting with another language and framework, such as Python/PyTorch or a framework with Keras bindings, such as TensorFlow, Theano or CNTK?

I also recognize that performance tests have indicated that especially for multi-GPU configurations, Torch is very fast in comparison to other frameworks.


(Guillaume Klein) #2

OpenNMT is derived from seq2seq-attn developed by Yoon Kim at Harvard. It is a strong sequence to sequence implementation in Torch with already many model configurations. The OpenNMT initiative was first to reorganize this project to reduce redundancy and facilitate further development.

So the choice of LuaTorch is first the legacy and we are still using it as we never encountered hard limitations, for both research and production uses. It also proved to have often better performance than other frameworks which is important as many users usually only have a single GPU.

However, you are correct to say that Lua is not a popular choice: poor adoption, few libraries of quality, etc., that do not attract developers and make working with it sometimes a pain. But these are non technical limitations.

Also for that:

I don’t think you want to use Lua nor Python anyway. So these limitations do not seem directly relevant when choosing a deep learning framework unless it directly provides a serving system, like TensorFlow Serving.

Restarting from scratch today I would certainly pick TensorFlow for its adoption and support. You can take a look at Google’s seq2seq which is well designed but unfortunately has its issues and is no more maintained.


(Etienne Monneret) #3

I don’t know what kind of programming language would have been better for such devs. But, I confirm that I often had a look at ONMT Lua code, and was puzzled by the way it’s written. Taking the time to learn Lua from the basics isn’t a priority for me comparing to a lot of other things, so… it’s a bit frustrating.
:wink:


(Devin G Bost) #4

I just learned that PyTorch runs on Windows, unlike TensorFlow. I don’t know how many Windows users are here on the forum, but I know that there are plenty of AI engineers who would greatly prefer to use their typical Windows workstation desktop environment for model training. TensorFlow appears to not be capable of supporting Windows any time soon because of its heavy dependency on symbolic linking via Bazel.

It also appears that PyTorch is gaining quite a lot of momentum in terms of feature advancement, and the architecture looks to be more scalable and easier to maintain than TensorFlow and the other static-graph-based neural network architectures.