Would anyone mind guiding me towards an explanation of the Mean encoder. Is it performing the same mean pooling you’d see on an image, even when the input is text? If so where do the step and size attributes normally associated with mean pooling occur?
mean = emb.mean(0).expand(self.num_layers, batch, emb_dim)
Is this portion of the code squashing the embedding in some way?
Thanks for any help.