Cross dependence of training parameters

Sometimes when i try to optimize my model and a got a message like parameter X must to be parameter Y * 8 and so on. Is anybody know where i can find this dependence table for my parameters function. This tension in particular parameters below

num_units=* ,
num_heads=* ,


I think the only requirement is that num_units must be divisible by num_heads.

thx for reply.

One more question.
On your opinion. What minimal parameters values we need to setup for valid training models. In touch to parameters from my previous post. Need for some smoke test cases.


I don’t have an answer to that. You would need to run experiments on your training data.