Woke up today to the homeserver being unresponsive. Couldn’t SSH, no video out when I connected a monitor, and even the reset button didn’t do anything. Had to hold the power button to shut it down.

/var/log/syslog doesn’t show anything interesting other than the issue happened at just after 4am. Log

2026-02-27T03:55:01.481794-08:00 blackbox CRON[1743418]: (www-data) CMD (/usr/bin/php8.3 /mnt/MONSTERDRIVE/pixelfeddata/pixelfed/artisan schedule:run >> /dev/null 2>&1)
2026-02-27T04:00:00.198504-08:00 blackbox smartd[2126]: Device: /dev/sdd [SAT], CHECK POWER STATUS spins up disk (0x81 -> 0xff)
2026-02-27T04:00:00.291853-08:00 blackbox systemd[1]: Starting sysstat-collect.service - system activity accounting tool...
2026-02-27T04:00:00.298344-08:00 blackbox systemd[1]: sysstat-collect.service: Deactivated successfully.
2026-02-27T04:00:00.298523-08:00 blackbox systemd[1]: Finished sysstat-collect.service - system activity accounting tool.
2026-02-27T04:00:00.299608-08:00 blackbox kernel: kauditd_printk_skb: 8 callbacks suppressed
2026-02-27T04:00:00.299613-08:00 blackbox kernel: audit: type=1130 audit(1772193600.298:798916): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=unconfined msg='unit=sysstat-collect comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
2026-02-27T04:00:00.299615-08:00 blackbox kernel: audit: type=1131 audit(1772193600.298:798917): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=unconfined msg='unit=sysstat-collect comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
2026-02-27T04:00:01.923610-08:00 blackbox kernel: audit: type=1101 audit(1772193601.922:798918): pid=1744810 uid=0 auid=4294967295 ses=4294967295 subj=unconfined msg='op=PAM:accounting grantors=pam_permit acct="www-data" exe="/usr/sbin/cron" hostname=? addr=? terminal=cron res=success'
2026-02-27T04:00:01.923614-08:00 blackbox kernel: audit: type=1103 audit(1772193601.922:798919): pid=1744810 uid=0 auid=4294967295 ses=4294967295 subj=unconfined msg='op=PAM:setcred grantors=pam_permit,pam_cap acct="www-data" exe="/usr/sbin/cron" hostname=? addr=? terminal=cron res=success'
2026-02-27T04:00:01.923615-08:00 blackbox kernel: audit: type=1006 audit(1772193601.922:798920): pid=1744810 uid=0 subj=unconfined old-auid=4294967295 auid=33 tty=(none) old-ses=4294967295 ses=50544 res=1
2026-02-27T04:00:01.923615-08:00 blackbox kernel: audit: type=1300 audit(1772193601.922:798920): arch=c000003e syscall=1 success=yes exit=2 a0=7 a1=7fff81d75200 a2=2 a3=0 items=0 ppid=2654 pid=1744810 auid=33 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=50544 comm="cron" exe="/usr/sbin/cron" subj=unconfined key=(null)
2026-02-27T04:00:01.923616-08:00 blackbox kernel: audit: type=1327 audit(1772193601.922:798920): proctitle=2F7573722F7362696E2F43524F4E002D66002D50
2026-02-27T04:00:01.924259-08:00 blackbox CRON[1744811]: (www-data) CMD (/usr/bin/php8.3 /mnt/MONSTERDRIVE/pixelfeddata/pixelfed/artisan schedule:run >> /dev/null 2>&1)
2026-02-27T04:00:01.924614-08:00 blackbox kernel: audit: type=1105 audit(1772193601.923:798921): pid=1744810 uid=0 auid=33 ses=50544 subj=unconfined msg='op=PAM:session_open grantors=pam_loginuid,pam_env,pam_env,pam_permit,pam_umask,pam_unix,pam_limits acct="www-data" exe="/usr/sbin/cron" hostname=? addr=? terminal=cron res=success'
2026-02-27T04:00:01.925610-08:00 blackbox kernel: audit: type=1110 audit(1772193601.924:798922): pid=1744811 uid=0 auid=33 ses=50544 subj=unconfined msg='op=PAM:setcred grantors=pam_permit,pam_cap acct="www-data" exe="/usr/sbin/cron" hostname=? addr=? terminal=cron res=success'
2026-02-27T04:00:02.357616-08:00 blackbox kernel: audit: type=1104 audit(1772193602.356:798923): pid=1744810 uid=0 auid=33 ses=50544 subj=unconfined msg='op=PAM:setcred grantors=pam_permit acct="www-data" exe="/usr/sbin/cron" hostname=? addr=? terminal=cron res=success'
2026-02-27T09:23:35.786375-08:00 blackbox systemd-modules-load[904]: Inserted module 'dm_multipath'

Would something like this be a direct hardware failure? Like a power supply hiccup or something? It happening at 4am coincides with my electric car starting to charge, but the server is on a dedicated 20A circuit and behind a battery backup. I also don’t see any power issues on my Sense monitor at that time though it has limited resolution.

Mainboard is a Supermicro H13SAE-MF and I’m using ECC RAM.

I’ve been running this hardware for over a year and never had this issue, but I’m running out of places to look.

Might be time to finally get IPMI working.

  • 9tr6gyp3@lemmy.world
    link
    fedilink
    English
    arrow-up
    8
    arrow-down
    1
    ·
    edit-2
    4 hours ago

    Solar flares and even the occasional random neutron particle hitting your equipment can cause some weird issues. If its just a one time occurrence and it doesn’t happen again, I wouldn’t worry too much about it.

    • Wildmimic@anarchist.nexus
      link
      fedilink
      English
      arrow-up
      4
      ·
      3 hours ago

      Wanted to say that - Random shit does happen, even to the most stable systems. There’s a cutoff in consumer hardware where selecting for more stability simply isn’t worth the cost such as radiation hardening. Best you can do is ECC Ram.