Averaging various epochs models


(Vincent Nguyen) #1

Hi,

I would like to be able to test averaging for instance 5 models from 5 epochs => final model

How easy would that be to implement ?

Thanks,
Vincent


(Guillaume Klein) #2

Relatively easy.

Let me know if you want to implement this. I can give you pointers.


(Guillaume Klein) #3

Here is a starter:

require('onmt.init')

local function appendParameters(store, params)
  for _, p in ipairs(params) do
    table.insert(store, p)
  end
end

local function gatherModelParameters(model, store)
  store = store or {}

  for _, submodule in pairs(model.modules) do
    if torch.type(submodule) == 'table' and submodule.modules then
      gatherModelParameters(submodule, store)
    else
      appendParameters(store, submodule:parameters())
    end
  end

  return store
end

local function gatherParameters(models)
  local parameters = {}

  for _, model in pairs(models) do
    appendParameters(parameters, gatherModelParameters(model))
  end

  return parameters
end

local checkpoint1 = torch.load('/home/klein/models/OpenNMT/baseline-1M-enfr_epoch1_8.49_release.t7')

local params1 = gatherParameters(checkpoint1.models)
print(params1)

It loads a first model and gathers all parameters in a single table params1.

Then it is about:

  1. Doing the same for the next model
  2. Iterating over each item and storing the moving average into params1
  3. Saving the first model with a new name

The script can be inspired by tools/release_model.lua


(Vincent Nguyen) #4

I f I read you well, it 'd better be an external extra tool rather than a training option, right ?


(Guillaume Klein) #5

Yes, it seems the easiest way to achieve this. Something like:

th tools/average_models.lua -models epoch1.t7 epoch2.t7 epoch3.t7 -output_model average.t7

(Vincent Nguyen) #6

what makes the more sense ? to store a moving average, ie in the end the last model weight will be much bigger than the previous ones or just make an arimethic average of all of them ?


(Guillaume Klein) #7

It seems I more precisely meant cumulative moving average which is the standard average but updated at each sample to keep at most 2 sets of parameters in memory at all times.


(Vincent Nguyen) #8

thanks to @guillaumekln and his patience I have the code working fine, but unfortunately it does not bring significant improvements.

don’t think it’s worth committing this.


(jean.senellart) #9

According to discussion we had with Marcin Junczys-Dowmunt - the cumulating moving average is a way to reduce variability between trainings.


(Guillaume Klein) #10

@vince62s Did you just try with the last N models of a training?

If you have a working script, I think it is worth sharing it at least via a Gist. Hopefully other people can try it and find a pattern where it works well.


(Vincent Nguyen) #11

Easier here:


(Guillaume Klein) #12

As the script is simple, it has been merged!