Selfhosted @lemmy.world Otter @lemmy.ca 1mo ago

Do you use anything to archive content for yourself or others? (research, videos, articles, and anything that could be lost to time or censorship)

I saw this post and I was curious what was out there.

https://neuromatch.social/@jonny/113444325077647843

Id like to put my lab servers to work archiving US federal data thats likely to get pulled - climate and biomed data seems mostly likely. The most obvious strategy to me seems like setting up mirror torrents on academictorrents. Anyone compiling a list of at-risk data yet?

It's Me @lemm.ee eccentric @lemm.ee 1mo ago

Do you use anything to archive content for yourself or others? (research, videos, articles, and anything that could be lost to time or censorship)

You're viewing a single thread.

38 comments

One option that I've heard of in the past

https://archivebox.io/

ArchiveBox is a powerful, self-hosted internet archiving solution to collect, save, and view websites offline.
- Going to check that out because....yeah. Just gotta figure out what and where to archive.
- I am using archivebox, it is pretty straight-forward to self-host and use.
  
  However, it is very difficult to archive most news sites with it and many other sites as well. Most cookie etc pop ups on a site will render the archived page unusable and often archiving won’t work at all because some bot protection (Cloudflare etc.) will kick-in when archivebox tries to access a site.
  
  If anyone else has more success using it, please let me know if I am doing something wrong…
  
  Monolith has the same problem here. I think the best resolution might be some sort of browser-plugin based solution where you could say "archive this" and have it push the result somewhere.
  
  I wonder if I could combine a dumb plugin with Monolith to do that... A weekend project perhaps.
- That looks useful, I might host that. Does anyone have an RSS feed of at risk data?
- This seems pretty cool. I might actually host this.
- Eyy, I want that!

38 comments