So I've been troubleshooting the federation issues with some other admins:
(Thanks for the help)
So what we see is that when there are many federation workers running at the same time, they get too slow, causing them to timeout and fail.
I had federation workers set to 200000. I've now lowered that to 8192, and set the activitypub logging to debugging to get queue stats. RUST_LOG="warn,lemmy_server=warn,lemmy_api=warn,lemmy_api_common=warn,lemmy_api_crud=warn,lemmy_apub=warn,lemmy_db_schema=warn,lemmy_db_views=warn,lemmy_db_views_actor=warn,lemmy_db_views_moderator=warn,lemmy_routes=warn,lemmy_utils=warn,lemmy_websocket=warn,activitypub_federation=debug"
Also, I saw that there were many workers retrying to servers that are unreachable. So, I've blocked some of these servers:
Posted this last night, but reposting for visibility:
To those experiencing federation issues with communities that aren't local, make sure to properly set your language in our profile! I thought my off-instance communities were having extremely slow federation, but the issue was I didn't have English as one of my profile languages.
I've read about it and I don't understand. I have a list of the languages in the settings, but I have all of them and I cannot remove or add any languages.