Outage - Database Stuck
... I can't get 1 day without issues apparently...
When I was asleep, it looks like the database got stuck in some sort of optimize process with everything stuck waiting on that.
Restarting the database server forced that process to clear and cleaned things up.
I suspect what happened was I got overzealous after things started working great and I set the background worker count too high (these workers automatically run in the background doing things like updating contacts, downloading posts, etc).
I tuned that setting down, and increased the number of connections the database allows.
Dear god I feel like a newb... I used to do this stuff professionally and I feel the fact that it's been years.
Biggest performance issue I've had for a little while with this server turns out to be because the hypervisor copied the MAC address when I copied the server. It shouldn't have taken me nearly this long to identify this problem!
Aaaargh!
Performance Issues - Tentatively Resolved?
I feel really dumb for not noticing the cause sooner, I most attribute it to the rareness of the problem and the inconsistency at which it occurred.
I use a virtual server environment for my servers, and one of my measures to improve reliability and performance was to have two instances of the webserver behind a load balancer. In laymen's terms, whenever you're connecting you're assigned to whichever has the least connections and they're otherwise identical (same files, connect to the same database, etc).
Well... turns out when I duplicated the server initially, the software decided to not change the MAC address. Laymen's: The ip address of the server is different, but the network uses the mac address to map ip addresses to boxes, if two boxes have the same mac address then traffic is going to sporadically and randomly switch between them... but also with a bad IP address which means half of the traffic is always getting rejected.
So your connection to the load balancer was fine, but it was struggling to connect to the webservers and the webservers were struggling to connect to the database.
Once I changed that the server immediately became quite zippy!
My sincere apologies for the impact.
like this
Definitely worth the subscription!
youtu.be/PlCHRkH9tEk?si=IfiUT9…
Welcome To Dropout.tv
What is Dropout.tv? It's too hard to put into words, so we just made this video. For exclusive series like Dimension 20, Game Changer, Make Some Noise, Um Ac...YouTube
Outage - Self Resolved - Investigating
The server went down for a few hours today and resolved before I could look at it. It's also been a really bad day for my physical health so I've had very little capacity.
I am looking into why it happened but have no firm answers at this time.
As far as I can tell, the biggest most obvious culprit is the database backup which was happening at that time.
It looks like the database is big enough that backups are no longer simple and I'll need to change my backup method.
Login Screen Issue
There was an issue for a little while that I didn't notice until I rebooted my desktop in which visiting foggyminds.com/ would give you the login screen regardless of your login status (as opposed to foggyminds.com/network or any other address).
Apparently the load balancer was mistakenly caching the root page which caused it to always show the login page.
Server Performance
Just acknowledging that the server has had some spotty performance recently. I'm unable to figure out the cause but continuously investigating.
We now have two load balanced servers, and I can establish that it's not specific to either server. I've updated the load balancer with caching settings which should help alleviate a little (public images now will get cached and not have to go through PHP and database queries).
The database is showing no performance issues that I can see, when the page lags the database queries are not.
Additionally, the server lag appears to be random and per request (as in another identical request made at the same time doesn't lag).
My focus is going to be on examining the PHP-FPM service on both nodes, they're both reporting slow execution.
Nitter Update
You may or may not have noticed that Twitter/X links on this site get rewritten. I have the Nitter addon installed that rewrites twitter.com links to use proxy pages that don't profit Twitter and don't expose you to the rest of Twitter's nonsense.
The Nitter site I was using (notabird.site) is no longer functioning, I've updated it to a similarly compatible site traittor.net/.
Hardware Status
This looks onerous right now... but I can imagine something like this being incorporated into AC system or dehumidifiers, etc, especially if it gets expanded to an array of diseases...
It doesn't so much prevent disease, but it can at least help with peace of mind and rapid disease control (ie. system goes off, everyone in the area quarantines to limit spread)
Outage / Hardware Failure And Recovery
I'm going to start with my apologies for the 24 hour outage. Thankfully there should be little impact beyond that.
I've been running all of this on a single box, and had plans (last night in fact) to expand it to two with a NAS backend to help protect it from failures and improve performance.
Ironically, the morning in which I planned to make those updates was when I experienced some sort of disk failure on the server. I don't know exactly what happened and the exact level of failure (diagnosis of that will be tonight, mostly to see if I need to get a warranty replacement, or if there was some software cause).
The disk failure presented itself as bad blocks and mild data corruption which prevented multiple services from running.
The good news is that it appears nothing of real importance was corrupted. We lost the majority of one database table, but all the table contained was a list of every single activity-pub contact the server could see and it's contents should be reconstructing automatically now (though obviously may take a while).
It might be worth checking your friends/followers to make sure there are no major absences, I don't know if this impacts who you're following or just the basic contact info.
Beyond that, here's what's changed and changing followed by accountability for my mistakes:
* The server now has a NAS backend with disk redundancy which will protect against drive failures, it also helps to share resources between this server and a second server once I have the drive issues figured out.
* Once the original server is fixed/cleared it will be running a load balanced second copy of the webserver (and eventually a copy of the database) to improve performance and reliability. (The nas will allow two copies of the server to share the same media files)
For accountability:
One of the things that made the failure take longer to recover from was the fact that my backups of the database had failed to run due to a typo.
I could have sworn I had checked it to ensure it was running, but clearly I had not.
I fixed the error and confirmed it ran successfully this morning. Database backups are run twice daily and backed up to a remote **encrypted** backup (borgbase.com). I will be checking periodically over the next week or two to confirm that it continues running.
Additionally, with the NAS now set up and available I am running full system snapshots of the database twice a day as well. This means I should have two avenues for recovery across two different methods going forward, which should significantly increase reliability.
Below is the text that was shown on the site during the outage with the updates, just for accountability:
Hardware Failure - Recovery Attempts In Progress
The primary disk used by the webserver, database, and some of my other servers appears to be failing critically.
I am attempting recovery efforts, but also have to work my dayjob. Automated backups have been running daily on the database so the server should be restorable.
I'll update this page with notes as possible (it's on the same box, so can potentially fail as well). I get off work at 5pm US CT time, this is being written at 9:50am.
If you'd like to reach out, shiri [at] bailem.me for email and shiri:beeper.com for Matrix (if you prefer XMPP shiri_beeper.com@aria-net.org should work to use the public bifrost bridge).
Update 7PM: everything is migrated and recovery is underway. Database is attempting to recover on it's own and I'm just waiting on it. Once it's done restoring, I will try to re-enable the server. If the database fails from that point, I'll restore the databse from the backup made early this morning.
Update 8PM: Database is just taking a long time to recover, didn't help that default settings timed out the recovery and I had to restart it. Database is running on MariaDB which has the good sense to design with various safety logs, so it's able to backtrack and rerun commands to recover data. I expect it to be back up in the next hour or two.
Update 8:30PM: Didn't help that I had a momentary power outage reset my progress.
Update 9:45PM: Upfront honesty, looks like the backups were broken and I should have taken a closer look at them. However, the database is only mildly corrupted and I'm going to be able to do a dump and restore. This takes time as it's 20GB in size, but I feel confident that it'll be good after this.
Update 1AM: we're just about to the end of it, I just have to figure out why I'm getting an odd session data error? Unfortunately I have to sleep and that means calling it for the night.
Looks like there's bigger issues with the lost APContact table, I'm investigating to see what I can do about that.
*Hopefully* this is self-resolving as it refinds all the users.
Fuzzy Thumbnails
I'm not entirely certain what's causing this, but I do know what kicked it off.
I'm doing some migrations of the media files between boxes and it seems to have somehow messed up the thumbnails and some cached remote images. However, I've seen no problems with any uploaded files (aside from thumbnail views of those files).
This appears to be resolving itself over time as the server updates contacts and recaches many of these files.
Migrations should be finished in the next day or two and there should be no significant downtime.
@Friendica Admins I could use some help cleaning up something on my instance.
I'm moving around the storage, and for some reason a lot of the thumbnails (not all) seem to have gotten corrupted. Regular files seem fine, but basically full size profile pictures are good but small versions are blurry.
Is there any good way to clear the storage of everything that's cached both in terms of thumbnails and data from other servers?
Friendica Admins reshared this.
I live together with my sibling in a two bedroom, I'm the only one with a consistent income and doing my best to care for them. They struggle with borderline personality disorder, ptsd, bipolar, and bad anxiety. They aren't able to maintain a stable living situation on their own.
They've been in a relationship for most of the past year with someone who was great on the surface but insecure and shitty about it underneath. They kept taking their insecurities out on my sibling and isolating them, and started helping them with their meds and then stopped multiple times making their situation so much worse (and getting shitty if anyone else helped, so I wasn't even in the loop to help because he would have shit on them for even telling me about it).
The break up went catastrophically bad because he kept escalating until my sibling was in the worst episode I've ever seen and eventually antagonized them to the point of shoving him... then the cops showed up and arrested my sibling. (seriously, ACAB.)
Long story short, skipping a lot of detail elements, this has put us in the position to be a few hundred dollars behind in budget to cover rent and expenses, especially the court costs (ex isn't pressing charges, but in Texas that doesn't matter apparently). They have to get into alcohol counseling (they were in a bad way and went out with friends that night, the bar severely overserved them), as well as get back on their meds, and pay their bond... all within the next few days.
We've got leads on a lot of resources, but it's not going to make ends meet in the short term and we could really use help.
If you could send some money to help out, the best method is Cash App to $ShiriBailem (benefits of a unique name!)
If you can't, a boost/reshare would be greatly appreciated!
reshared this
like this
reshared this
This is so damn fascinating, an AI playing the game, not as a speedrun but humanlike, and dynamically monologuing in the voice of the original character!
I'm really tired and at the end of my rope and could use some broad advice and help... I don't expect much...
My sibling (by choice) got arrested last night because they were in a major mental health crisis (really bad Borderline/Bipolar episode) and slapped/shoved their partner in the middle of it. I don't think that's okay for them to have done and am not remotely justifying it.
This was brought on by their breakup with their partner and the fact that they ran out of their medications.
I'm in the US, Texas to be specific. They need consistent access to medications and therapy to be remotely stable... and that just isn't available.
We moved into this apartment last year with various plans on how I would support them but all of those plans fell through. Their mother was supposed to cover their half of bills until they could get on disability, but just before move-in she basically disowned them.
They were off and on able to pick up some work initially, but were very unstable. Started this relationship with their now-ex who moved in and covered their half of the bills for most of the past year, but that ended recently and their ex is moving out.
I simply don't make enough to really cover them by myself and I can't abandon them when they're actively trying to be and do better. We just happen to live in a country and state that believes that it's better for people to just die than actually help someone who hasn't "earned it"
reshared this
Just sharing a random bit of AI fun because I was bored and had a thought.
AI can incidentally translate language, but that also extends to things like converting victorian english to modern day english and even updating slang and euphamisms (though when asked for slang it'll lay it on a bit thick... but it's fun about it)
So... here's the opening monologue of Romeo & Juliet (and no, it didn't preserve iambic pentameter):
Yo, listen up, peeps! Two big-shot families, equally high and mighty,
Kicking it in fair Verona, where this drama's gonna go down.
An old beef turns into some fresh drama,
And it's so bad it gets everyone dirty.
From the messed-up love lives of these two rival crews,
A pair of star-crossed lovers are gonna do something crazy,
Their messed-up misadventures bring down a whole lotta pain,
And with their deaths, their parents' drama finally stops.
The crazy rollercoaster of their death-marked love,
And the never-ending feud between their folks,
That only ends with their tragic deaths,
Is gonna be the main event on our stage—
If y'all stick around and listen close,
We'll do our best to keep it real for ya!
Server Crash
Regarding the downtime that happened last night (while I was trying to sleep which is why it went on for so long).
The short version is a bunch of stuff clogged up the pipes, hung, and just needed a good ol' fashioned restart to fix. I've made some changes to help reduce the chance of that happening again, as always I'm sorry for the trouble.
The longer version is that the php worker processes hung and bogged down the database which brought the whole thing to a screeching halt.
I've changed the limits on those workers so they should have less impact and hopefully not do that again (the downside is that they'll be a little slower on federating updates, but most of the time that shouldn't be noticeable).
I've also taken advantage of the existing downtime to migrate the database over to a second machine with more memory. I originally intended to upgrade the memory of the machine it was on, but unfortunately made the mistake of buying the wrong chips. That plan is still pending. However by migrating it I was able to expand the memory usage significantly which should help performance, the downside is that it's a busier system so the CPU is occasionally busier and can sometimes have a negative performance impact (it's likely negligible but I'm not super confident of that).
In the next week or two I plan to do further hardware upgrades, but with the database migrated already it should be negligible if any downtime.
Once that's done, I'm hoping to implement some high availability options to further reduce downtimes.
If you're experiencing particularly slow load times on the network page (the default homepage with your main feed), one thing on your end you can tune is how many items it tries to load at one time.
Go to Settings -> Display -> Content/Layout and you can change "Number of items displayed per page".
Especially as an item includes a post and all of it's comments as one item, this can make a drastic performance difference (on my personal feed 40 takes >30 seconds to load sometimes, but 20 takes < 3 seconds)
Well, regarding disablities:
I don't know if I should still write something like this nowadays - but so be it.
2-3 years ago I went by train and there was a person in a wheelchair without legs, who turned on everyone else.
I kept out of it, but when I got off I had to inevitably pass him and he he had also for me me a stupid saying in store.
I was really annoyed by him and asked him where he was going? to a dance contest?
*concerned silence*
He started to laugh and said that I was the only one who would take him seriously.
I see him 1-2 times a year at most, but a friendship has grown out of the situation :)
@Raroun I call that a lucky bit of chemistry. Also, just because someone isn't using saying ableist things doesn't mean they're being ableist... walking lightly around disabled people is also it's own form of ableism that gets really tiring.
And I'm not even sure your joke is what I was talking about since it was very direct and more a specific targeted jab that happened to reference their disability, you didn't even mark it as something shameful but rather just spouted an absurdity.
A prime example would be something like mocking the lisp of a Nazi. Sure, that person is a Nazi... but what about all the non-Nazis that you're also mocking at the same time?
Call For Mutual Aid!
One of my sibling's friends was hit by a car and is temporarily disabled, they're currently living off of only $70 in food stamps while they can't work and every little bit can help!
They've been in this state for a few weeks but only now broke down and asked for help (we all know how it is with hyper-independence).
They can take donations over Paypal here: bigcloud456@yahoo.com
reshared this
Server News
in reply to Server News • •