I have question about the transformer in SubLayerConnection, is the code wrong?

CuriousCat-7 · November 19, 2018, 11:33am

The residual connection is applied by x + dropout( sublayer( layer_norm( x ) ) ), while the paper and the tutorial formula is layer_norm( x + dropout( sublayer( x ) ) )

Is it a mistake in your code? Or it is from some deliberate consideration?

Thanks

guillaumekln · November 19, 2018, 11:36am

See here:

CuriousCat-7 · November 19, 2018, 1:04pm

aha, thank you