What’s going on on your servers?

I had to bite the bullet and buy new drives after the old ones filled up. I went for used enterprise SSDs on eBay and eventually found some that had an okay price, even though it’s been much more than last time I got some. Combined with Hetzner’s hefty price increase some month ago, my hobby has become a bit more expensive again thanks to the ever growing appetite of companies building more data centers to churn more energy.

Anyways, the drives are in, my Ansible playbook to properly encrypt them and make them available in Proxmox worked, so that was smooth (ignoring the part where I disassembled the Lenovo tiny from the rack, open it, SSD out, SSD in, close it and put it back in only to realize I put in the old ssd again).

Any changes in your hardware setups? Did the price increase make you reconsider some design decisions? Let us know!

  • TheRagingGeek@lemmy.world
    link
    fedilink
    English
    arrow-up
    5
    ·
    8 hours ago

    so this week I was getting ready for my workday when my Son tells me CraftyController is inaccessible, so I tried to SSH into the box that the service is pinned to… nada, dead. tried to power cycle it, nada.

    now this node was a B450M-A mobo Ryzen 7 2700X platform with some hodgepodge scrap RAM I’ve had running in it(RAM birthday was 2019). I hooked it up to a mini monitor and a keyboard, but it didn’t post at all, so just a blue screen of no signal. unfortunately the B450M-A mobo didn’t feature POST debug lights, nor did it use QLED, it apparently relied on PC Speaker, and my machine wasn’t telling any tales. so since I had no real idea as to the root cause and after reseating the RAM and the GPU and fiddling with it got me nowhere, I got my partner to approve the outspend for replacement of the motherboard so that I could have actual Debug indicators.

    Thursday the ROG B550-F Gaming WIFI II mobo arrived, as did the Ryzen 9 5900XT and the Nautilus 360RS cooler. I spent the evening assembling the mobo and CPU and the GPU, the RAM, and all the related wiring. figured I would do the Cooler the next day. Yesterday I got the cooler in place with some serious hardware acrobatics. I then fired it up and Yellow LED. DRAM issue, so I unseated all of the RAM, plugging in one of the hodgepodge sets(I had 4x8GB ram sticks) neither set worked, went to just trying a single stick. of the 4 sticks only 1 was able to get past the Yellow LED and into completed POST.

    So the RAM was shot and I’m not going to run containers on a machine with only 8 GB of ram. so I ordered up some Vengeance LPX 2x16G sticks and they arrived this morning! I just finished slotting them and then wrestling with Gentoo’s understanding of where all the hardware was. it was a lot of fiddling with the gentoo kernel config, and installing the nvidia drivers, but after all of that was done, the system booted up successfully! I’ve now got it back in its residence connected up to the UPS power, about to shunt docker containers back to the newly improved machine with 2x the CPU capacity.

    Was a wild ride, but the cool part of it was when the system shat itself it was part of a 3 node Docker Swarm and I had recently migrated to a NAS for persistence of my container data. though the other 2 nodes aren’t as overbuilt as this thing, so I did have to do some memory wrangling and disabling my lower priority services in order to restore service, but I was able to ensure all necessary services were able to run during the outage, and I got some learning in regards to a couple of the services that didn’t port as cleanly as I would’ve liked. all in all fun times in system administration! lol.