I made an oopsie
My apologies to everyone for the long extended outage yesterday.
For background, 2 of the hard drives in the storage array this site is using have gone bad. I got replacements in and set to work migrating storage.
Unfortunately I was overconfident in the process as I had never actually needed to perform such a migration before, let alone in a live environment used by others.
I made assumptions in how the tools would operate in swapping out the drives (I used 'pvmove'...) and didn't realize that the tool I selected would lock the entire filesystem until it was done. (it took ~10 hours to transfer a single disk...)
This was made worse by the fact that the second replacement drive was DOA (it actively prevented the system from booting, so I spent a couple hours troubleshooting that before I realized I hadn't knocked something loose... the system was basically just rejecting the new drive).
There's sadly more downtime to come before this is resolved *but* it should be drastically shorter. Next time I'll be using a different tool to transfer without locking the system, so the downtime will just be 2 reboots (1 to put in the new drive, 1 to take out the old).
The replacement to the replacement drive will be arriving on Tuesday.