Hi!
I have a question: if I put in my configuration file a corpus with only the path_src (other corpora also have the path_tgt), will this specific corpus be used for language modeling of the source language?
I f I understand your question correctly, the anwser is no, but maybe it would be easier if you explain what you would like to accomplish.
2 Likes
Dear Carmela,
If you use the regular Transformer configuration, this would result in an encoder-decoder model. However, most GPT-like models adopt a decoder-only architecture.
This is how you can build GPT-like models with OpenNMT-py. I tried this last year to build a small language model, and the results were promising.
https://opennmt.net/OpenNMT-py/examples/LanguageModelGeneration.html
Probably, you can also have a look at language models on Hugging Face, and try to fine-tune relatively small ones on Google Colab.
All the best,
Yasmin
1 Like