endlesstalk.org
- Successfully updated to 0.19.7
Besides a small issues with how the database migrations should be run, the upgrade went fine.
There seems to be an issue with creating posts. I'm looking into it.
Let me know, if there are any issues with the site.
- Upgrading to 0.19.7 on the 16th November 18:00 UTC, instead of 0.19.6 EDIT Migrations are running now. The site might go down soon.
Upgrading to 0.19.7, since it contains some fixes discovered after 0.19.6 have been released . Release notes here for more information about the bugfixes.
For this upgrade there will be downtime and I expect it to last around 30 min but it all depends on how long the database migrations will take. If there are any major issues with the upgrade, you can check the uptime site here or site status here
Local time can be seen here
- Upgrading to 0.19.6 on the 16th November 18:00 UTC
0.19.6 brings a lot of changes. See release notes here for more information.
For this upgrade there will be downtime and I expect it to last around 30 min but it all depends on how long the database migrations will take. If there are any major issues with the upgrade, you can check the uptime site here or site status here
Local time can be seen here
- Images not loading EDIT: Fixed. Images for last 4 days lost.
Images aren't loading and I'm not sure what the cause is, after looking into it shortly.
Will look more into, when I have the time.
EDIT: Pictrs db was missing a lots of data. I'm unsure how the data was lost. I have now restored from a backup from 4 days ago(Earlier backups didn't contain data), so images for the last 4 days will be lost.
- Moved to new host again. Images for the last 6 days lost.
This should however be the last time for a long time, since I have greatly improved the setup.
Pictrs database got corrupted during the process, which is why the images for the last 6 days are lost.
Let me know, if there are any issues after moving to the new host.
- Restoring from backup after database corrupted.
The database was corrupted, so had to recover from backup. About 6 hours of data was lost.
There does seem to be a weird network error sometimes. I'm looking into it.FIXEDLastly I apologize for all the downtime lately.
- Migrated to new host succesfully.
Endlesstalk has been migrated to a new host.
There did seem to have been some caching issues, causing some certificate issues, but it seems to be fixed(Atleast for me).
Let me know, if there are any issues after the migration
EDIT: The 525 ssl certificate error shows up intermittently. I'm looking into it.
EDIT2: Fixed. Was an issue with the nginx loadbalancer.
- Downtime caused by server going down. Migration to new host tommorow at EDIT: 20:00 UTC
Earlier today one of the servers, where endlesstalk is hosted went down. After some time, the server came back up again, but there were some unknown issue and the server was unstable. So preparation to migrating endlesstalk to a new host began. However after setting the new servers up, there was success with getting one of the "old" servers up and running again.
Tommorow at
18:0020:00 UTC the migration to the new host will begin. See local time here. There will be some downtime with this, probably around an hour or less.EDIT: Server went down again, but should be back again now.
EDIT2: 20:00 UTC, since I forgot I have something from 17-19 UTC.
- Successfully upgraded to 0.19.5
The upgrade went smoothly and everything seems to work.
Let me know, if there is anything that dosn't work after the upgrade.
- Upgrading to 0.19.5 on 22th June at 13:00 UTC
I have found the issue with the database migration, so the upgrade to the latest version of lemmy can proceed.
0.19.5 brings a lot of smaller bugfixes. See release notes here for more information. I will also upgrade the database to a newer version(postgres 16).
For this upgrade there will be downtime and I expect it to last around 1 hour or less. If there are any major issues with the upgrade, you can check the uptime site here or site status here
Local time can be seen here
- Upgrading to 0.19.4 postponed.
The database migration to 0.19.4 failed, because the database schema doesn't align with the state the migrations want. The reason is probably because it didn't restore correctly from a previous backup, but I don't actually know the cause.
I thought I could create a new database with a correct schema and then import the data from the currrent database into the new one. This might still be possible, but it simply takes too long and it has gotten too late for me(03:00 in the night).
I will look into a fix for the migration and when I have a fix I will announce a new date for the upgrade to 0.19.4.
- Updating to 0.19.4 on 15th June at 19:00 UTC EDIT: Issue with database migration. Will take a bit longer
0.19.4 brings a lot of changes. See release notes here for more information.
There should be no downtime or very minimal downtime. If there are any issues, check the uptime site here or site status here
Local time can be seen here
Note: An update to postgres 16 and pictrs 0.5 is also comming soon, which will bring some downtime. Don't know when yet, but will post an update, when I know.
EDIT: There was an issue with migrating the database, while upgrading to 0.19.4, so will take longer.
EDIT2: The database is in a different state, than the migration to 0.19.4 expects. The cause is not clear, but I'm looking into it.
- [Fixed] Instability of site
Hello
I have noticed that the server have been going down a lot for 10 - 20 mins.
Unfortunately, I'm currently on vacation, so I don't think I will have the time to fix it.
I will be back tomorrow evening and will look into it and hopefully fix it then.
EDIT: There was a misconfiguration of the auto scaling setup. This scaled the system up and used all of its CPU, which caused the site to be unresponsive.
This should be fixed now, but I will keep monitoring it.
- Approx 6 hours downtime, caused by deletion of all data.
While working on a small fix to lemmy, that was causing some unneeded cpu usage, I made a change that unfortunately caused the db and pictrs service storage to be deleted.
Thankfully I have backups of everything, so I went to restoring from a backup. However the restoring was very slow, since I used an unoptimal way to backup the db(raw sql dump). After the first backup completed, I found out that it was missing data, so I tried an older backup, but that didn't work either. It was missing data as well. So I tried a backup from another server(Since I backup to 2 different servers), which finally worked.
Usually restoring from backups, haven't taken too long previously, since my backups are faily small, but I will need to look into a quicker way to restore backups for lemmy, since the backup size of lemmy is much bigger.
NOTE: Data from ca. 2 hours before the site went down(16-18 UTC) will be missing and I'm unable to restore it.
- Images/pictrs not working. UPDATE: Moved to new host.
The s3 host that pictrs use, have gone down.
I might have to move to another s3 host, which will mean it will take a bit before images are working. This will cause a loss of images for the last 2-3 hours before pictrs stopped working.
EDIT: Have moved to new s3 host. Unsure how many images were lost during the outage.
- Updating to 0.19.2 on the 14th January at 15:00 UTC EDIT: Sucessfully updated!
Local time can be seen here
0.19.2 contains fixes for outgoing federation and a few other things. See release notes here for more information.
There should be no downtime or very minimal downtime. If there are any issues, check the uptime site here or site status here
EDIT: The server went down / was very very slow to respond. I'm not quite sure why.
- The update to 0.19
The 0.19 version is out and I expect to update sometime during the next week. Since it is a big release, I will need to spend some more time testing, that everything still works fine and ensure that the migration works without problems as well.
I will update with another post, when I know, when the update will take place.
- Moving to new server tomorrow(24/09) at 9:00 UTC. Update: Success
I expect a very minimal downtime of ca. 5-15 mins.
- Downtime from 12:00 to 15:40 UTC
I had made some config changes to the database earlier in connection with the move to a new server, that caused the storage usage of the database to grow a lot. Then when it had no more space left, it crashed which caused the downtime. Unfortunately it happened at a time, where I wasn't available to fix immediately, which is why it was down for so long.
It is now fixed and I will keep a watch(setup an alert for database disk usage) to make sure it doesn't happen again.
- Unstable/downtime from 18:00 to 19:00 UTC
Seems to be caused by low amount of available storage. This caused k8s to evict/delete pods -> site went down,, then it would fix itself -> site goes up again but then k8s would evict again -> site goes down. This continued untill it stabilized at some point.
This should be fixed and there should be more space available, when I move the server to a new host. I expect to move to a new server sometimes in the comming week. Will annonce the date, when I know, when it will happen.
EDIT: Spoke a little bit too soon, should be fixed now though.
EDIT2:
There was something that kept using storage, so ran into the issue again. Then the volume/storage for the image service(pictrs) stopped worked for some unknown reason(Thankfully, I have backups) and there shouldn't be any images lost.
Good news is that I have reclaimed a lot of storage, so shouldn't be in danger of running out of space for a long time.
- Downtime from 23:32 - 23:56
While preparing for migration to a new host, I had to setup the db, but during that I deleted a resource in k8s to force a reload of settings in the db. This caused the db to use a different volume and it took a bit before I could revert it back to using the old volume.
No data should have been lost. Let me know, if anything is missing.
- Images for the last 8 hours deleted because of CSAM.
Unfortunatly the tool for scanning CSAM didn't detect the image, so to ensure there are no CSAM on the server, images for the last 8 hours has been deleted.
- Downtime from 14:34 - 14:49: Filesystem issues for unknown reasons.
The filesystem manager(longhorn) I use reported that multiple volumes were faulted. This caused the site to go down.
I have no idea why the volumes faulted, only that a reboot of the server fixed it. Hopefully this was a strange one off and it doesn't occur again.
- Switching to partly automated deferation
To make it easier to deferate from unwanted instances I have switched to using the fediseer. With this tool I can get censures from other trustworthy instances. A censure is a "completely subjective negative judgement"(see more here) and reasons for the censure can be listed.
Currently I'm using the censures from lemmy.dbzer0.com(Can be seen here), that has any of the following reasons for the censure
- Lolicon
- CSAM
- Fascism
- Hate speech
- Bigotry
- Pedophilia
- Bestiality
- MAP
I will still manually deferate from instances, when it is needed, but this makes easier to deferate bad instances I would have missed/didn't know about.
Note: The automated deferation also includes spam instances, which is currently defined by
- More than 30 registered users per local post + comments
- More than 500 registered users per active monthly user.
- Endlesstalk.org: Deferating from rqd2.net
The main reason is that they allow/support pedophilia, but they also allow zoophilia and biastophilia. They try to label it as MAP(minor-attracted person), but it is still pedophilia. Example of MAP post here.
- Endlesstalk.org: Deleting images for the last hour, because of report of CSAM.
There have been a report of CSAM and unfortunately the lemmy-safety doesn't go through the images quickly enough(on my hardware) to be of use in this case.
I think there exists a tool to purge a image via information from a post, but wasn't unable to find it now. In the future I can hopefully use that tool, when reports of CSAM come in.
- Deleting potentially problematic images(Because of CSAM)
To ensure that there aren't any CSAM or other problematic images, I have setup db0's lemmy-safety tool. This will scan images and delete them, if there is a high chance that it is an illegal or unethical image.
Unfortunately, the tool isn't perfect, so sometimes perfectly fine images might be deleted after you have uploaded them. In that case you need to upload a different image, since a similar image will probably be flagged as well.
When the moderation tools for lemmy are better I can hopefully remove the scanning tool, but untill then I think this is the best option.
If anyone has an alternative/better idea, I would love to hear it.
- Images from the last 24 hours deleted, because of CSAM.
There has been CSAM on another instance and since it might have federated to this instance, I have deleted the images for the last 24 hour.
New images from now on should work. Let me know, if they don't.
- Migrating images to s3 storage on Wednessday 23/08 at 16 UTC. UPDATE: Images sucessfully migrated
The site will go down(a maintenance site will be shown instead), while the migration is ongoing. I expect it to take 10-20 mins if everything goes well.
EDIT: I thought I would able to move images in the background before the main migration, so that I could avoid having the site down for multiple hours. Unfortunatly after testing the migration, I have found that it is impossible to do. So the site will probably be down for 3-4 hours, since there is about 100GB of images, that needs to be migrated.
EDIT2: Images have now been successfully migrated. Let me know, if any images are missing.
- Upgrading endlesstalk to 0.18.4 tomorrow(August 9th) at 14:00 UTC. EDIT: Successfully updated to 0.18.4
There shouldn't be any downtime, since it is a simple upgrade with no database migrations.
EDIT: Has now been successfully updated to 0.18.4. There were no issues, so nothing should be lost.
- Alexandrite has been updated to official self hosted version.
Thanks to sheodox@lemmy.world there have been added an official self hosted version of alexandrite, which new.endlesstalk.org now uses. There have also been added more features in the update(See here)
Let me know, if there's any issues with the update.
- Endlesstalk will upgrade to 0.18.3 tommorow at 12:00 UTC. UPDATE: Successfully updated to 0.18.3
There should be no downtime and no content should be lost.
- Images not working. UPDATE: Working again now.
Images are currently not working, since it seems there been corrupted files in the image service. I will probably need to use a backup to get it working again, so some images might be lost.
EDIT: Images from 16/07 16:00 CET to 17/07 13:45 have been lost, but the image service is now working again.
I think I will do more frequent backups to avoid losing too many images, if it happens again.