I’d like to try to integrate an open-source translation engine into a CAT tool, for Swedish to English translation.
I was wondering, first of all, how does one gather the data on which to train OpenNMT? Can it perform comparatively to Google Translate or Facebook’s new translation engine? Why or why not? I mean, are their learning algorithms fundamentally better, for any reason - industry-secrets, or more computing power? And what about the data, the corpora or web crawler they use? Is it at all possible for an individual to set one up just as good as theirs? How so? Or, more broadly, is it possible that a machine translation system could be as comprehensive as a state of the art dictionary? For example, if we could simply feed it the most exhaustive corpus imaginable, could we hope it could provide very effective, encyclopedic translation suggestions for a wide variety of obscure terms and expressions? In other words, that the system can actually begin to compete with the best known dictionaries in its coverage and accuracy - or even, be superior.
And I’m lastly also wondering, is there any exhaustive list or keyword out there for computational systems that can provide any kind of word reference? It could be a list of synonyms, a list of translations, or any kind of semantic content or analysis that in effect can provide a “definition” or in essence clarification on “what this word means”, roughly? I ask just to know as a translator what tools are out there beyond dictionaries and machine translation.
Thanks very much.