inari@piefed.zip to Technology@lemmy.worldEnglish · edit-27 hours agoDeepSeek ditches Nvidia for Huawei chips in V4 launchcybernews.comexternal-linkmessage-square24fedilinkarrow-up1125arrow-down15
arrow-up1120arrow-down1external-linkDeepSeek ditches Nvidia for Huawei chips in V4 launchcybernews.cominari@piefed.zip to Technology@lemmy.worldEnglish · edit-27 hours agomessage-square24fedilink
minus-square[object Object]@lemmy.calinkfedilinkEnglisharrow-up6·edit-26 hours agoEven with a bitnet, it’s almost definitely better to train on a high precision float then refine down to bits. I would expect bitnet to require more layers for equivalent quality too.
minus-squarebrucethemoose@lemmy.worldlinkfedilinkEnglisharrow-up2arrow-down1·6 hours agoI just meant for mass inference serving. Yeah, I haven’t seen much in the way of bitnet training savings yet, like regular old QAT. It does appear that Deepseek is finetuning their MoEs in a 4-bit format now, though.
Even with a bitnet, it’s almost definitely better to train on a high precision float then refine down to bits.
I would expect bitnet to require more layers for equivalent quality too.
I just meant for mass inference serving.
Yeah, I haven’t seen much in the way of bitnet training savings yet, like regular old QAT. It does appear that Deepseek is finetuning their MoEs in a 4-bit format now, though.