Influence of the parameter "number of heads" on the size of the model

Guys, can u tell me does “number_of_heads” (transformer parameter) affect on the size of the output model in mb?

No, the number of attention heads does not change the model size.

1 Like