Too many trainable paramters with Transformer model

huyenvt · January 24, 2019, 3:12am

Hi all,
I using Transformer model to train our dataset (5M sentences) but total trainable parameter is 139M params. I read in Attention is all you need paper, they have only 65M params. Why it is so many params?
Could someone explain to me?
Thanks.
Huyen

guillaumekln · January 24, 2019, 7:10am

Hi,

Can you be more specific about the OpenNMT version you are using and your training options?

huyenvt · January 25, 2019, 8:00am

Hi,
I use OpenNMT tensorflow, I use model TransformerANN, num_layers = 2, other options are default. I use tensorflow-gpu version 1.12.

guillaumekln · January 25, 2019, 8:17am

What is the size of your vocabulary?

Additionally, the TransformerAAN model is not the one used in the Google’s paper.

huyenvt · January 28, 2019, 3:37pm

My vocab is about 6M sentences. I will try Transformer-base.

huyenvt · February 11, 2019, 6:49am

Hi Guillaume,
I using Transformer model, it runs with 150663034 trainables, source vocab : 54569, target vocab : 76665.

guillaumekln · February 11, 2019, 8:40am

Sounds about right. Is it an issue?