Definition of can dish it but can’t take it

  • humanspiral@lemmy.ca
    link
    fedilink
    arrow-up
    1
    ·
    2 hours ago

    old (1 year = 1000 years in AI) accusation not relevant to expected upcoming deepseek breakthrough model. Distillation is used to make smaller models, and they are always crap compared to training on open data. Distillation is not a common technique anymore, though it’s hard to prove that more tokens wouldn’t be “cheat code”

    This is more a desperation play from US models, even as youtube is in full, “buy $200/month subscriptions now or die” mode.