Hi,
I would like to be able to test averaging for instance 5 models from 5 epochs => final model
How easy would that be to implement ?
Thanks,
Vincent
Hi,
I would like to be able to test averaging for instance 5 models from 5 epochs => final model
How easy would that be to implement ?
Thanks,
Vincent
Relatively easy.
Let me know if you want to implement this. I can give you pointers.
Here is a starter:
require('onmt.init')
local function appendParameters(store, params)
for _, p in ipairs(params) do
table.insert(store, p)
end
end
local function gatherModelParameters(model, store)
store = store or {}
for _, submodule in pairs(model.modules) do
if torch.type(submodule) == 'table' and submodule.modules then
gatherModelParameters(submodule, store)
else
appendParameters(store, submodule:parameters())
end
end
return store
end
local function gatherParameters(models)
local parameters = {}
for _, model in pairs(models) do
appendParameters(parameters, gatherModelParameters(model))
end
return parameters
end
local checkpoint1 = torch.load('/home/klein/models/OpenNMT/baseline-1M-enfr_epoch1_8.49_release.t7')
local params1 = gatherParameters(checkpoint1.models)
print(params1)
It loads a first model and gathers all parameters in a single table params1
.
Then it is about:
params1
The script can be inspired by tools/release_model.lua
I f I read you well, it 'd better be an external extra tool rather than a training option, right ?
Yes, it seems the easiest way to achieve this. Something like:
th tools/average_models.lua -models epoch1.t7 epoch2.t7 epoch3.t7 -output_model average.t7
what makes the more sense ? to store a moving average, ie in the end the last model weight will be much bigger than the previous ones or just make an arimethic average of all of them ?
It seems I more precisely meant cumulative moving average which is the standard average but updated at each sample to keep at most 2 sets of parameters in memory at all times.
thanks to @guillaumekln and his patience I have the code working fine, but unfortunately it does not bring significant improvements.
don’t think it’s worth committing this.
According to discussion we had with Marcin Junczys-Dowmunt - the cumulating moving average is a way to reduce variability between trainings.
@vince62s Did you just try with the last N models of a training?
If you have a working script, I think it is worth sharing it at least via a Gist. Hopefully other people can try it and find a pattern where it works well.
Easier here:
As the script is simple, it has been merged!