Hi, I’m trying to experiment paper R-drop by Opennmt-tf. R-Drop: Regularized Dropout for Neural Networks
First, during the model training, I need to repeat the input data x and concatenate them ([x; x]) in batch-size dimension.
Second, I add a R-drop loss compute function in loss.py
Third, I modify the loss compute function on training dataset. sequence_to_sequence_compute_loss. Adding a parameters about on-off R-drop, when using R-drop, the funtion will return R-drop loss instrad of cross entropy.
But I can’t figured out how to repeat the input data x and concatenate them ([x; x]) in batch-size dimension. Moreover, if I want to use multi GPU devices, the situation becomes more complicated.
Hi,
You probably want to make this change in the model call method? Here you can access the features
and labels
which should correspond to x
and y
.
You can apply the concatenation to all inner features using something like this:
features = tf.nest.map_structure(lambda x: tf.concat([x, x], 0), features)
1 Like
Thank you very much! I will try it.
Hello, have you figured it out? I am also working on how to train a mode using R-drop .