Skip to main content



Lenin: Civilisation, freedom and wealth under capitalism call to mind the rich glutton who is rotting alive but will not let what is young live on. https://wordsmith.social/protestation/quotes#quote9169


George L. Jackson: Fascism has temporarily succeeded under the guise of reform. The only way we can destroy it is to refuse to compromise with the enemy state and its ruling class. https://wordsmith.social/protestation/quotes#quote9170



Short Planned Maintenance Tonight


My apologies if this is inconvenient, I opted to do it on shorter notice without a set hour because (a) there's not a lot of activity on the server and (b) I'm really impatient.

I'm doing a hardware upgrade that requires rebooting the network storage backend which will bring down everything for a short time. It should take well under 30 minutes to do the hardware swap and most of the downtime is just going to be the database starting back up (which often takes in the range of another 30 minutes).

As part of this I'll also be deploying some software updates that require a reboot to take effect.



WTF?


I honestly haven't the foggiest idea how this happened, but apparently the DNS settings got changed a few days ago on the servers with absolutely no explanation (and to junk nonsense settings for some reason). I'm going to keep an eye on them to make sure they don't change again.

Additionally I think that created a cascade that caused the other problems.

Any posts you've made over the past 2-3 days haven't been sent to other servers, but will start sending now.

As far as the other problems, I think when that happened it caused so many processes to lag and take way longer and more resources than usual as any time it tried to contact another server it timed out on the dns request.



DOS Overload


There's been some recent outages of the server, the root cause I've tracked down to the server getting overloaded with requests (mostly updates from other servers). Those updates have been coming in faster than the server can process them and preventing other requests from coming through.

I've made some tweaks that I believe have resolved it, fingers crossed.

Technical explanation:

The servers ran out of php-fpm threads to handle requests. It was configured with static count of 30 each (60 total). They were definitely impacted significantly by memory leaks which kept the count low.

I've changed it from static to ondemand and increased the count to 100 each, I'll probably go in and increase it again since it's still pegged at that limit almost constantly. But thankfully running on-demand seems to be keeping the memory usage per thread drastically lower.

Where the static assignment of 30 was eating up 8GB of ram, 100 on-demand threads is only taking up 1.3GB.

I'm going to increase it until it's either hitting memory constraints or it's no longer constantly at full capacity.

in reply to Server News

There's definitely some sort of time and code problem involved as it hit again this morning even with the previous changes, though this time it only impacted updates (making posts/comments/likes, getting new posts). I think reading was unaffected because those operations are faster and require significantly less memory.

For whatever reason, sometime around midnight the server gets hit with a bunch of requests that all seem to lock up, eating up large quantities of memory and then won't exit. (With on-demand the threads exit after 10s of being idle, there was over 100 threads running continuously from midnight until I killed them around 9am). Likewise there was a very massive flood of updates from other servers corresponding to that, so I think it might just be a bunch of large servers sending bulk updates or some such.

New tuning to handle that: I put firmer time limits into PHP to prevent threads from running forever, there's two options for setting max times and the first was getting ignored (I think friendica overrode it? the second should override that and kill any threads going too long)

In addition to that, I set up a rate limiter to the inbox endpoint (where other servers send updates to), this should help keep that from overloading the server (majority of the time it'll just be slowing them down by a second or two unless the server is overloaded, at which point the rate limit should help get it accessible for users)