Guys, can u tell me does “number_of_heads” (transformer parameter) affect on the size of the output model in mb?
No, the number of attention heads does not change the model size.
1 Like
Guys, can u tell me does “number_of_heads” (transformer parameter) affect on the size of the output model in mb?
No, the number of attention heads does not change the model size.