• [object Object]@lemmy.ca
    link
    fedilink
    English
    arrow-up
    6
    ·
    edit-2
    6 hours ago

    Even with a bitnet, it’s almost definitely better to train on a high precision float then refine down to bits.

    I would expect bitnet to require more layers for equivalent quality too.

    • brucethemoose@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      1
      ·
      6 hours ago

      I just meant for mass inference serving.

      Yeah, I haven’t seen much in the way of bitnet training savings yet, like regular old QAT. It does appear that Deepseek is finetuning their MoEs in a 4-bit format now, though.