Training a language model


This page has a warning saying we need to have the same features & dictionary for the LM and Decoder.

I am building a Speech Recognition system and my input is kaldi features to the decoder with -idx_files.

Could you please explain what the warning on the above page means? Is it possible to use a character-based LM and use filter bank features for the decoder, during translation?

Thank you,

Hi @Shruti, the warning is referring to target features. So as long as your LM model use the same target features (case?) and vocabulary (characters) than you seq2seq system, it will be ok.

1 Like