I made an oopsie
My apologies to everyone for the long extended outage yesterday.
For background, 2 of the hard drives in the storage array this site is using have gone bad. I got replacements in and set to work migrating storage.
Unfortunately I was overconfident in the process as I had never actually needed to perform such a migration before, let alone in a live environment used by others.
I made assumptions in how the tools would operate in swapping out the drives (I used 'pvmove'...) and didn't realize that the tool I selected would lock the entire filesystem until it was done. (it took ~10 hours to transfer a single disk...)
This was made worse by the fact that the second replacement drive was DOA (it actively prevented the system from booting, so I spent a couple hours troubleshooting that before I realized I hadn't knocked something loose... the system was basically just rejecting the new drive).
There's sadly more downtime to come before this is resolved *but* it should be drastically shorter. Next time I'll be using a different tool to transfer without locking the system, so the downtime will just be 2 reboots (1 to put in the new drive, 1 to take out the old).
The replacement to the replacement drive will be arriving on Tuesday.
Reduced Performance / Reliability
One of the servers went down from hardware failure, thankfully since I run this across multiple boxes with failover it means the site is (obviously) still up.
It might occasionally get a little spotty on connection and especially on performance until that server gets replaced as it means the remaining server is a tad bit overloaded.
It'll probably be a few weeks unfortunately as I don't have the spare funds to pre-purchase a replacement (the protection plan I purchased will cover it, but I've got to mail off the unit, wait for the money, then wait for financial stresses to pass enough that I can order the replacement... then a good week or two delivery time after that)
