

Aside: WTF are they using SSDs for?
LLM inference in the cloud is basically only done in VRAM. Rarely stale K/V cache is cached in RAM, but new attention architectures should minimize that. Large scale training, contrary to popular belief, is a pretty rare event most data centers and businesses are incapable of.
…So what do they do with so much flash storage!? Is it literally just FOMO server buying?














Again, I don’t buy this. The training data isn’t actually that big, nor is training done on such a huge scale so frequently.