Slrpnk instance is down till mid July; they might relaunch their server on piefed.

Pro@programming.dev · edit-2 23 hours ago

Slrpnk instance is down till mid July; they might relaunch their server on piefed.

Kris@feddit.org · edit-2 18 hours ago

The failure seems to have been in the main firewall, if it had been the server itself we could have easily restored it on another server from the backups on another machine. But as it stands, remote access is entirely cut off.

There usually is another person with hardware access, but they are on summer holidays. This seemed like an acceptable risk at the time…

An off-site backup would have been nice of course, but due to the costs involved in running an Lemmy instance of that size on a rented server, it would have not been a great option either.

I have plans to add a KVM to the main firewall via a secondary connection, but even that might have not helped in this case. I’ll know more when I have physical access again.

sparky@lemmy.federate.cc@lemmy.federate.cc · 1 hour ago

Is it run out of a private residence? How could it happen if it’s in a real data center…?

Kris@feddit.org · 13 minutes ago

It is run from a private residence in the DIY punk spririt (and this also allows us to run of a local solar PV system), but more or less the same would happen if you rent rack-space in a “real” data-center. Only if you rent a managed server or VPS someone else will be responsible to fix such issue and this comes at a significantly higher cost at the scale we operate at (slrpnk is part of a bigger project that also hosts other services).

nickwitha_k (he/him)@lemmy.sdf.org · 12 hours ago

I’ve done a lot of SysAdmin and DCOps stuff in the past so, thought I’d give you some plausible suggestions (haven’t dug deep into Lemmy DB stuff and DNS/Federation of the stack, so not sure all is practical).

Scenario 1 - Preserve and merge when access is restored

Setup

Spin up two VMs/VPS (or one that has enough grunt for two Lemmy servers). Call them robak.slrpnk.net and slrpnk.net and point DNS appropriately.
Pull federated content from other instances and place it on robak, set as read-only.
Sync important comms to (new) slrpnk.net without content.
Allow users to sign up, vetting as possible (all mods). Keep a list of those that are vetted (call it vetted.list). Inform all users that any non-vetted users will have their content dropped when access is restored.

Merge!

Once access is restored, ensure that (old) slrpnk.net is set to read-only.
Schedule a maintenance window (announce more time than you are likely to need).
During the maintenance window, put (new) slrpnk.net into R/O, or just block external access.
Query the db on (old) slrpnk.net for all users.
Subtract the vetted users from vetted.list from the list.
Drop all records from the resulting list of non-vetted users from (new) slrpnk.net.
Insert the records from vetted and new users (those without conflicts) into the DB on (old) slrpnk.net.
Validate that everything is working
Cut over DNS and spin down the new VMs/VPS.

Scenario 2 - Server is in DC or Admin able to facilitate access

Get a db dump/backup.
Spin up temporary slrpnk.net on a VM/VPS.
Use backup of temporary server to restore data to original, when possible.

IcyToes@sh.itjust.works · 10 hours ago

Appreciate the answer and the detail. Good luck getting it all resolved.