Averaging various epochs models

vince62s · June 26, 2017, 11:54am

Hi,

I would like to be able to test averaging for instance 5 models from 5 epochs => final model

How easy would that be to implement ?

Thanks,
Vincent

guillaumekln · June 26, 2017, 1:33pm

Relatively easy.

Let me know if you want to implement this. I can give you pointers.

guillaumekln · June 26, 2017, 2:25pm

Here is a starter:

require('onmt.init')

local function appendParameters(store, params)
  for _, p in ipairs(params) do
    table.insert(store, p)
  end
end

local function gatherModelParameters(model, store)
  store = store or {}

  for _, submodule in pairs(model.modules) do
    if torch.type(submodule) == 'table' and submodule.modules then
      gatherModelParameters(submodule, store)
    else
      appendParameters(store, submodule:parameters())
    end
  end

  return store
end

local function gatherParameters(models)
  local parameters = {}

  for _, model in pairs(models) do
    appendParameters(parameters, gatherModelParameters(model))
  end

  return parameters
end

local checkpoint1 = torch.load('/home/klein/models/OpenNMT/baseline-1M-enfr_epoch1_8.49_release.t7')

local params1 = gatherParameters(checkpoint1.models)
print(params1)

It loads a first model and gathers all parameters in a single table params1.

Then it is about:

Doing the same for the next model
Iterating over each item and storing the moving average into params1
Saving the first model with a new name

The script can be inspired by tools/release_model.lua

vince62s · June 26, 2017, 2:52pm

I f I read you well, it 'd better be an external extra tool rather than a training option, right ?

guillaumekln · June 26, 2017, 2:54pm

Yes, it seems the easiest way to achieve this. Something like:

th tools/average_models.lua -models epoch1.t7 epoch2.t7 epoch3.t7 -output_model average.t7

vince62s · June 26, 2017, 6:11pm

what makes the more sense ? to store a moving average, ie in the end the last model weight will be much bigger than the previous ones or just make an arimethic average of all of them ?

guillaumekln · June 26, 2017, 7:15pm

It seems I more precisely meant cumulative moving average which is the standard average but updated at each sample to keep at most 2 sets of parameters in memory at all times.

vince62s · June 26, 2017, 8:44pm

thanks to @guillaumekln and his patience I have the code working fine, but unfortunately it does not bring significant improvements.

don’t think it’s worth committing this.

jean.senellart · June 27, 2017, 4:13am

According to discussion we had with Marcin Junczys-Dowmunt - the cumulating moving average is a way to reduce variability between trainings.

guillaumekln · June 27, 2017, 7:31am

@vince62s Did you just try with the last N models of a training?

If you have a working script, I think it is worth sharing it at least via a Gist. Hopefully other people can try it and find a pattern where it works well.

vince62s · June 27, 2017, 8:14am

Easier here:

guillaumekln · June 27, 2017, 10:16am

As the script is simple, it has been merged!