Outage - Database Stuck
... I can't get 1 day without issues apparently...
When I was asleep, it looks like the database got stuck in some sort of optimize process with everything stuck waiting on that.
Restarting the database server forced that process to clear and cleaned things up.
I suspect what happened was I got overzealous after things started working great and I set the background worker count too high (these workers automatically run in the background doing things like updating contacts, downloading posts, etc).
I tuned that setting down, and increased the number of connections the database allows.
Dear god I feel like a newb... I used to do this stuff professionally and I feel the fact that it's been years.
Biggest performance issue I've had for a little while with this server turns out to be because the hypervisor copied the MAC address when I copied the server. It shouldn't have taken me nearly this long to identify this problem!
Aaaargh!
Performance Issues - Tentatively Resolved?
I feel really dumb for not noticing the cause sooner, I most attribute it to the rareness of the problem and the inconsistency at which it occurred.
I use a virtual server environment for my servers, and one of my measures to improve reliability and performance was to have two instances of the webserver behind a load balancer. In laymen's terms, whenever you're connecting you're assigned to whichever has the least connections and they're otherwise identical (same files, connect to the same database, etc).
Well... turns out when I duplicated the server initially, the software decided to not change the MAC address. Laymen's: The ip address of the server is different, but the network uses the mac address to map ip addresses to boxes, if two boxes have the same mac address then traffic is going to sporadically and randomly switch between them... but also with a bad IP address which means half of the traffic is always getting rejected.
So your connection to the load balancer was fine, but it was struggling to connect to the webservers and the webservers were struggling to connect to the database.
Once I changed that the server immediately became quite zippy!
My sincere apologies for the impact.
Server News
in reply to Server News • •