• brucethemoose@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    arrow-down
    1
    ·
    edit-2
    16 hours ago

    I wouldn’t use the word “desperate.”

    Scaling is inefficient.

    For training, it takes a ton of work to even get half-decent utilization across a bunch of servers, and it makes any sort of experimentation with architectures immensely more difficult.

    Hence allegations that some GPUs are assigned “busywork” just to meet utilization quotas from the hardware seller.

    For inference, scale isn’t so important. But the demand for tokens is self inflicted: from Meta shoving chatbots in ramdom places in software, and from their architecture being archaic and inefficient.


    In other words, none of this has to be. It’s just the whims of one insecure man, surrounded by sycophantic tech bros, who’s feeling FOMO but doesn’t understand transformers LLMs at all.

    If he had half a brain, he wouldn’t have fired the team that literally founded the open weights LLM space.

    But he’s also too rich to ever feel the consequences of bad decisions now.