Foggyminds Social | Community

Downtime

Sorry for the downtime, server crashed and I wasn't able to address it quickly or smoothly in my current mental state.

I took advantage of the downtime, and probably exacerbated it a little, and finally did the migration back to local hardware... so it *should* be more reliable.

What honestly stretched it out a lot is that I also applied the most recent stable Friendica version and that took forever to update the database.

Vanessa likes this.

Vanessa reshared this.

in reply to Server News

Vanessa

in reply to Server News • 1 year ago from SubwayTooter •

Downtime

@Server News
Thanks for that

@Server News

Server News

1 year ago •

Server News
1 year ago •

Major Service Notice, USpol

It's with a heavy heart that I'm letting y'all know that I'm disabling open-registrations and *encouraging* all users to find a new home.

I'm not kicking anyone off my server, but unfortunately due to the political situation here in the US the chance for things to go *very bad very quickly* I can not vouch for this server as reliable.

I live in Texas and need to figure out plans to evacuate at this point. I was holding out hope that we'd at least have status quo (as monstrously awful as it is) for longer.

And to make things worse, I'm a trans-woman, they actively want to make my very existence (let alone presence online) illegal and have been building the machinery to make that a very fast process once Trump assumes office.

So I do not recommend this server any longer for those reasons, if you choose to stay I'll continue to run it and support y'all.

Server News

1 year ago •

Server News
1 year ago •

Sorry for the downtime

Housing situation has been a pain, still on the temporary server environment and been hitting major resources bottlenecks.

I'm hoping to get a place sooner, but I've hit some roadblocks that very likely will push it drastically further out... in which case I'll need to spend more money on this environment to return things to stable.

Reminder that I'm covering this entirely out of my own pocket and these constraints are because I'm between homes and can't use my own much cheaper hardware. If you appreciate this instance I would very much appreciate a donation.

Vanessa likes this.

Server News

1 year ago •

Server News
1 year ago •

<Insert Profanities>

So the temporary system had some sort of failure, I'm not even 100% sure what caused it to be honest. It went down sometime yesterday and some of the virtual drives got corrupted, which caught the database and the virtual gateway device.

I was able to restore the system... most of the way. Thankfully there are backups of the database, but some of them were also flawed as well, the most recent intact one was from 5/16, so 5 days were lost.

To be clear, this problem was exacerbated by the fact that there's not as much redundancy in the temporary setup (sadly it looks like it'll be a few more months before I have a place of my own and can spin up my own hardware again). But I'm going to still look at how I might get those in better shape.

As far as how long it took: I had a busy day yesterday and didn't see that the server was down until I was too exhausted to do anything about it, so it had to wait until I got off work today... each attempt at restoring the database takes around an hour, so that took *a while* to get restored.

Vanessa likes this.

Server News

1 year ago •

Server News
1 year ago •

Image Upload Trouble

If y'all have had issues uploading images... sorry about that.

I missed a setting when re-configuring the server after transfer to it's current environment and it's fixed now.

Longer Explanation: there are two servers involved, a reverse proxy and then the actual Friendica server. The Friendica server accepted uploads up to 100MB... the proxy didn't have that setting... so it would just go for a bit, then timeout. Added that setting to the proxy and all solved.

Server News

1 year ago •

Server News
1 year ago •

Server Migration Complete

Thank you for your patience!

The server has been migrated to a new dedicated remote server (hosted with OVH) and to the newest version of Friendica (2024.03, previously it was 2023.12).

Things should stabilize mostly for the time being, but I will be on the lookout for bugs.

Vanessa likes this.

Server News

1 year ago •

Server News
1 year ago •

Server Migration Update

The downtime for server transfer is coming up sooner now, I've had to twist a few things to make it work and my apologies for any inconvenience.

I deleted a large chunk of the media on the server, primarily focusing on data that hasn't been accessed in 60 days, but it does look like it hit a few more recent pieces.

If you've been with us for a few months you might have lost some old photo uploads, some old contacts might have blurry profile photos for a while until the system decides to redownload them later.

To be clear, this *only* impacted images and we don't offer any guarantees against data loss on this server, especially with uploaded images. There's simply too much data there for me to reasonably back up at this time, and worse yet there really isn't any distinction in Friendica that allows me to backup only local media as opposed to cached remote media.

Server News

1 year ago •

Server News
1 year ago •

Upcoming Short Downtime

I'm going through some financial crisis right now and forced to couch surf for the next few months, so I'm not going to have a place or the server ***BUT*** it won't be offline more than a few hours.

There's going to be a short downtime sometime in the next few days. Things are a mess so I don't have an exact set time.

But I've temporarily rented a dedicated server from OVH for my needs and will be migrating the server there so that it can stay online while I work on getting a new place.

Additionally, when the time comes I will also be applying the new Friendica 2024.03 update.

Server News

1 year ago •

Server News
1 year ago •

Heads Up For Possible Outage

A reminder that I'm located in the US, and more particularly in Texas and this is a server run out of my home.

With the massive freeze incoming this weekend, there is a decent chance of a significant and extended power outage (Texas has a notoriously awful and poorly managed power grid, notably run completely separate from the rest of the country).

Vanessa likes this.

Server News

1 year ago •

Server News
1 year ago •

Server News

2 years ago •

Server News
2 years ago •

DDOSed by... facebook chat?

Apparently some facebook interface decided to DDOS the site a little over an hour ago.

It's not overwhelming the network, just an absolutely ridiculous number of requests.

I've solved it by instituting a global rate limit. It should be high enough to not affect anyone actually using the server.

Basic gist is that any more than 10 requests a second gets a 429 error (Too Many Requests, like all error codes with this site it'll give you a cute cat picture specific to that error). This is purely per second, so if you see that error at any point the time it takes you to refresh again the limit will already be reset.

Vanessa likes this.

Server News

2 years ago •

Server News
2 years ago •

Aaaaaand We're back

So that one bad hard drive that was left went completely kaput and managed to throw the whole array into an unstable state. I couldn't boot the server until I got the replacement for the replacement drive.

Got that this morning, did a few hours of tinkering to get the array to accept the new array while the old drive was completely removed (it didn't like that lol). But once I got that in, everything came right back up.

Tomorrow I should be getting a replacement for the impaired server and I should be back to 100%.

After that, I intend to use the refund for the old one to get some extra SSDs into the two servers. That'll let me arrange things so that this site doesn't rely on the network storage and can be both faster and less prone to failure.

Server News

2 years ago •

Server News
2 years ago •

I made an oopsie

My apologies to everyone for the long extended outage yesterday.

For background, 2 of the hard drives in the storage array this site is using have gone bad. I got replacements in and set to work migrating storage.

Unfortunately I was overconfident in the process as I had never actually needed to perform such a migration before, let alone in a live environment used by others.

I made assumptions in how the tools would operate in swapping out the drives (I used 'pvmove'...) and didn't realize that the tool I selected would lock the entire filesystem until it was done. (it took ~10 hours to transfer a single disk...)

This was made worse by the fact that the second replacement drive was DOA (it actively prevented the system from booting, so I spent a couple hours troubleshooting that before I realized I hadn't knocked something loose... the system was basically just rejecting the new drive).

There's sadly more downtime to come before this is resolved *but* it should be drastically shorter. Next time I'll be using a different tool to transfer without locking the system, so the downtime will just be 2 reboots (1 to put in the new drive, 1 to take out the old).

The replacement to the replacement drive will be arriving on Tuesday.

Server News

2 years ago •

Server News
2 years ago •

Reduced Performance / Reliability

One of the servers went down from hardware failure, thankfully since I run this across multiple boxes with failover it means the site is (obviously) still up.

It might occasionally get a little spotty on connection and especially on performance until that server gets replaced as it means the remaining server is a tad bit overloaded.

It'll probably be a few weeks unfortunately as I don't have the spare funds to pre-purchase a replacement (the protection plan I purchased will cover it, but I've got to mail off the unit, wait for the money, then wait for financial stresses to pass enough that I can order the replacement... then a good week or two delivery time after that)

Server News

2 years ago •

Server News
2 years ago •

Short Planned Maintenance Tonight

My apologies if this is inconvenient, I opted to do it on shorter notice without a set hour because (a) there's not a lot of activity on the server and (b) I'm really impatient.

I'm doing a hardware upgrade that requires rebooting the network storage backend which will bring down everything for a short time. It should take well under 30 minutes to do the hardware swap and most of the downtime is just going to be the database starting back up (which often takes in the range of another 30 minutes).

As part of this I'll also be deploying some software updates that require a reboot to take effect.

Server News

2 years ago •

Server News
2 years ago •

WTF?

I honestly haven't the foggiest idea how this happened, but apparently the DNS settings got changed a few days ago on the servers with absolutely no explanation (and to junk nonsense settings for some reason). I'm going to keep an eye on them to make sure they don't change again.

Additionally I think that created a cascade that caused the other problems.

Any posts you've made over the past 2-3 days haven't been sent to other servers, but will start sending now.

As far as the other problems, I think when that happened it caused so many processes to lag and take way longer and more resources than usual as any time it tried to contact another server it timed out on the dns request.

Server News

2 years ago •

Server News
2 years ago •

DOS Overload

There's been some recent outages of the server, the root cause I've tracked down to the server getting overloaded with requests (mostly updates from other servers). Those updates have been coming in faster than the server can process them and preventing other requests from coming through.

I've made some tweaks that I believe have resolved it, fingers crossed.

Technical explanation:

The servers ran out of php-fpm threads to handle requests. It was configured with static count of 30 each (60 total). They were definitely impacted significantly by memory leaks which kept the count low.

I've changed it from static to ondemand and increased the count to 100 each, I'll probably go in and increase it again since it's still pegged at that limit almost constantly. But thankfully running on-demand seems to be keeping the memory usage per thread drastically lower.

Where the static assignment of 30 was eating up 8GB of ram, 100 on-demand threads is only taking up 1.3GB.

I'm going to increase it until it's either hitting memory constraints or it's no longer constantly at full capacity.

in reply to Server News

Server News

in reply to Server News • 2 years ago •

There's definitely some sort of time and code problem involved as it hit again this morning even with the previous changes, though this time it only impacted updates (making posts/comments/likes, getting new posts). I think reading was unaffected because those operations are faster and require significantly less memory.

For whatever reason, sometime around midnight the server gets hit with a bunch of requests that all seem to lock up, eating up large quantities of memory and then won't exit. (With on-demand the threads exit after 10s of being idle, there was over 100 threads running continuously from midnight until I killed them around 9am). Likewise there was a very massive flood of updates from other servers corresponding to that, so I think it might just be a bunch of large servers sending bulk updates or some such.

New tuning to handle that: I put firmer time limits into PHP to prevent threads from running forever, there's two options for setting max times and the first was getting ignored (I think friendica overrode it? the second should override that and kill any threads going too long)

In addition to that, I set up a rate limiter to the inbox endpoint (where other servers send updates to), this should help keep that from overloading the server (majority of the time it'll just be slowing them down by a second or two unless the server is overloaded, at which point the rate limit should help get it accessible for users)

Server News

2 years ago •

Server News
2 years ago •

Oops

Made a performance tweak that shouldn't have had an impact and resulted in a non-error being flagged as an error (I was getting 302 which really just means "look at this other address").

Fixed the tweak, otherwise should be a tiny bit better. I've got it recognizing a lot of the potential errors and better skipping between servers if one of them acts up.

⇧