File size increased 2.5 times after pre processing speech to text

amittiwari · October 11, 2019, 8:51am

I have 41 GB data for preprocessing. After preprocessing it converted into approx. 100 GB size.
What should I do to reduce the preprocessed file size?

francoishernandez · October 11, 2019, 9:48am

The order of magnitudes seems on par with the demo dataset. It is surely because preprocessing dumps additional computed features to the shards. Not sure we can do anything apart from changing parameters like frame duration/stride or feature size, but this may hinder your performance.
You could also try to compute features on the fly if you feel comfortable diving in the code.

vince62s · October 11, 2019, 6:42pm

only way is to shard.

amittiwari · October 14, 2019, 12:00pm

Shard is not helpful, I tried.

vince62s · October 14, 2019, 6:03pm

not helpful in what sense?
it makes much smaller files that can be handled easily, no ?

amittiwari · October 15, 2019, 11:46am

about handling part you are right . but preprocessing file size did not reduce.