I’m trying to make a general text generating model like GPT2, but with transformer model structure.
Is it possible? Will there be any difficulty, problem or tricks I need to notice before I start?
As far as I know, beam search can’t do long sentence prediction as it may lead to degeneration problem. Should I use topk with topp sampling during inference? I have implemented topp feature myself in opennmt-tf.
Thank you very much!