Bigger VRAM means you can train bigger models (or models with longer sentence lengths). A Transformer-base model just needs 8GB, A Transformer-big needs ~16GB.
Obviously 1-2 3090 is currently the best, but they are overpriced; you could have bought 2 at MSRP.
I agree with James, 3090 is an absolute beast, but a few notes:
If you plan to scale to more than 2 GPUs, RTX 3090s will be a pain. Each of them takes up 3 slots and consume too much power (~350W). So you have to account for a large case, probably a new motherboard with enough and spacious PCIe slots, and a power supply with the highest power delivery you can find.
Depending on the prices in your region, I suggest you take a look at RTX A5000 too. In my region, they sell for a little lower than RTX 3090, they take up only 2 slots, and they run with much lower power (~230W). They also have vGPU software support. Their performance is a bit lower than 3090’s (but still amazing), but their pros can’t be ignored.