Possible to use transformer architecture to do what gpt2 does?

Hi,

I’m trying to make a general text generating model like GPT2, but with transformer model structure.

Is it possible? Will there be any difficulty, problem or tricks I need to notice before I start?

As far as I know, beam search can’t do long sentence prediction as it may lead to degeneration problem. Should I use topk with topp sampling during inference? I have implemented topp feature myself in opennmt-tf.

Thank you very much!

It should work as long as you can represent the training data as source/target.

If you are looking for a diverse generation, you should indeed use random sampling.