Do you host your own AI?

SuspiciousCarrot78@aussie.zone · 1 month ago

Do you host your own AI?

plasma8726@lemmy.today · 1 month ago

Thanks! I’ll look into this. I’m a bit limited at 12GB of VRAM right now.

brucethemoose@lemmy.world · edit-2 1 month ago

A 3060?

Exllama/TabbyAPI is still worth looking at if you are trying to run a model purely in GPU RAM. It’s easily the most VRAM efficient backend, it just doesn’t support CPU offloading (which is useful for MoEs if you have considerable spare CPU RAM) and more optimized for 4xxx and up Nvidia cards.

And TabbyAPI has a docker container you can use. Look for “exl3” models on huggingface.