Looking for a master's thesis topic



I am a Computer Science (Mathematical Information Technology) student at University of Jyväskylä. I am currently looking for the topic, study method, and aims for my master’s thesis focusing on neural machine translation.

Currently, I am still trying to figure out whether to make my thesis just a literature review or something a little more hands-on. I find that if I want to something hands-on, OpenNMT is a great project to work with. I have been reading about OpenNMT, including the technical report by Klein et al. My question is: what is there to do with OpenNMT? Are there some features, theories, or different techniques to apply or at least test for OpenNMT (within the scope of a master’s thesis)?

I realise that if I familiarised myself with the code and read more about the topic, I could probably figure something out by myself, but at this point, it would help me tremendously to get some ideas from people who are already familiar with OpenNMT and the world of neural machine translation in general. Thank you!

(Vincent Nguyen) #2

It depends really on:

  • what you are willing to focus on: theory, coding, testing, …
  • the time you can devote to this.

There is something that we never found the time to do:
revisit the bench mark of each kind of model vs the related paper.

Lately we focussed on the Transformer but we never released updated results for the WMT tasks for the LSTM (even though I did some with the Lua framework for the conference) , CNN, SRU.

Then from a more theoritical, there is plenty of testing to accomplish by mixing some configurations.
For this, some coding is required, and would be welcome as contributions.



Hi Vincent,

thank you for the reply. I realise now that I didn’t explain what I mean by ‘hands-on’, sorry about that. I mean doing some kind of application of theory to OpenNMT, so in this case, probably coding and/or testing.

As for the time I can devote, our university recommends (based on study points) 800-900 hrs of work for the master’s thesis, of which about 40% goes into theoretical background and 40% to data gathering/analysis, but I am ready to devote even more time (and probably will spend more time on it anyways). I am starting my thesis now and I need to graduate (at latest) before July 2020, so I have allocated about 1,5 years for the thesis.

I will take this information to my instructor, but I am also interested in details, for example, about testing related to mixing some configurations.

Thank you very much!

(Bachstelze) #4

Hi mucla,

There has been serveral studies comparing different architectures:

In the case you just want to compare different architectures I would suggest:

Though the mixing of different architectures is a permanent research field:

But the open source community is still lacking in a multi-source architecture or automatic post-editing approach (which I see in the simplest case as a mulit-source translation). So using post-editing methods in machine translation is a promising field:

An idea for a new translation method would be a iterative post-editing architecture. In such way that the architecture generates a discrete translation output and uses the ouput iteratively as a new input.

This idea would be comparable with a iteration in the decoder:

“Our approach differs from automatic post-editing since it does not require post-edited text which is a scarce resource (Simard et al., 2007; Bojar et al., 2016).” from Roman Novak, Michael Auli, David Grangier in Iterative Refinement for Machine Translation; 2016

The reason for the specific architectures is unintelligible, because you can simple produce a training-corpus with erroneous translations by back-translating a parallel text with a system that is not trained on it.
After the implementation of a multi-source architecture, the aim in testing would be to find a break point in the iteration for the best translation
On the one side there could be a break point by the translation system itself from its strong bias or on the other side nearly none approximation for the translation. If the none approximation occurs then the aim would be to generate a bunch of suggestions and estimate their quality with simple methods like the vector space model (http://multitranslation.space/metric) or leave the estimation open for big and specific architectures:

This is just my idea for a interesting research. Accordingly I don’t know how complex it will be in the end and if the results are usable. Put my 2 cents in.


Hi Bachstelze,

Thank you for the thorough reply! It does seem like there is room for research! I will discuss this with my instructor.

  • mucla

(Bachstelze) #6

Hi mucla!

How was the discussion? Let me know if I can help you with wide questions. But keep in mind that i am just a random person from the internet. Be also sure to have a good hardware access and setup before you start.

Greetings wagtail