Too many trainable paramters with Transformer model


(Vu Thuong Huyen) #1

Hi all,
I using Transformer model to train our dataset (5M sentences) but total trainable parameter is 139M params. I read in Attention is all you need paper, they have only 65M params. Why it is so many params?
Could someone explain to me?

(Guillaume Klein) #2


Can you be more specific about the OpenNMT version you are using and your training options?

(Vu Thuong Huyen) #3

I use OpenNMT tensorflow, I use model TransformerANN, num_layers = 2, other options are default. I use tensorflow-gpu version 1.12.

(Guillaume Klein) #4

What is the size of your vocabulary?

Additionally, the TransformerAAN model is not the one used in the Google’s paper.

(Vu Thuong Huyen) #5

My vocab is about 6M sentences. I will try Transformer-base.

(Vu Thuong Huyen) #6

Hi Guillaume,
I using Transformer model, it runs with 150663034 trainables, source vocab : 54569, target vocab : 76665.

(Guillaume Klein) #7

Sounds about right. Is it an issue?