Speech to text Preprocessing

I have build a speech to text model, but it is not showing expected results. What can I try at preprocess level for better results? Currently I am using VCTK corpus for training model.