• 7 Posts
  • 36 Comments
Joined 4 years ago
cake
Cake day: November 25th, 2022

help-circle

  • robber@lemmy.mltoSelfhosted@lemmy.worldHardware for local inference?
    link
    fedilink
    English
    arrow-up
    7
    ·
    edit-2
    12 days ago

    To add some practical advice:

    It depends on what you mean by more advanced models. I run Qwen3.6-27b on 48GB VRAM across 3 cards (RTX 2000e Ada), and with the recent software optimizations merged into llama.cpp (tensor parallelism & MTP) I get around 30 tokens per second in generation. I use the model through openwebui for (agentic) web research and simple Q&A mostly and I’m quite happy with what it can do.

    If you want something similar, maybe look at one or two second hand V100 PCIE 32GB. Or something from the Intel Arc Pro series, if you don’t mind the software support lacking behind a bit (as in less optimized).

    Also it might be worth reading into the difference of dense vs MoE models, if you’re new to that. For MoE models, if your system RAM is fast enough, it’s often viable to offload the “experts” (largest parts of such models) to RAM, reducing VRAM capacity needs. Note that server motherboards with e.g. octa-channel RAM have a huge advantage over consumer boards (making DDR4 interesting despite slower speed per module).

    And to adress your last question, while I have no direct experience, I’ve seen posts online about people connecting Strix Halo or DGX Spark devices, but usually via a 10+Gbit/s switch as interconnect is crucial (except if you just want to load balance).

    Self-hosting LLMs is a very fun thing to do, but also a time- and money-consuming rabbit hole. You might wanna check out the LocalLlama community over at shitjustworks.

    Edit: typos







  • It really depends on how much you enjoy to set things up for yourself and how much it hurts you to give up control over your data with managed solutions.

    If you want to do it yourself, I recommend taking a look at ZFS and its RAIDZ configurations, snapshots and replication capabilities. It’s probably the most solid setup you will achieve, but possibly also a bit complicated to wrap your head around at first.

    But there are a ton of options as beautifully represented by all the comments.














  • And openwrt is capable enough?

    Yeah it’s insane right? Every address is reachable when I open a port range. And it’s like there are ~ 10 predefined services (HTTP/S, SMTP, …) and the category “All other ports” where also 22 is part of. So I really have the choice to either keep everything shut or leave everything wide open.

    I think I can’t use my own modem but I’ll have to double check with my ISP. But yes the Wi-Fi is also provided by that router and it’s also quite crappy.