Selfhosted @lemmy.world Otter @lemmy.ca 8mo ago

Do you use anything to archive content for yourself or others? (research, videos, articles, and anything that could be lost to time or censorship)

I saw this post and I was curious what was out there.

https://neuromatch.social/@jonny/113444325077647843

Id like to put my lab servers to work archiving US federal data thats likely to get pulled - climate and biomed data seems mostly likely. The most obvious strategy to me seems like setting up mirror torrents on academictorrents. Anyone compiling a list of at-risk data yet?

It's Me @lemm.ee eccentric @lemm.ee 8mo ago

Do you use anything to archive content for yourself or others? (research, videos, articles, and anything that could be lost to time or censorship)

You're viewing a single thread.

38 comments

I have a script that archives to:

Internet Archive: Digital Library of Free & Borrowable Texts, Movies, Music & Wayback Machine

Webpage archive

Ghostarchive, a website archive

Self-hosted https://archivebox.io/

I used to solely depend on archive.org, but after the recent attacks, I expanded my options.

Script: https://gist.github.com/YasserKa/9a02bc50e75e7239f6f0c8f04fe4cfb1

EDIT: Added script. Note that the script doesn't include archiving to archivebox, since its API isn't available in stable verison yet. You can add a function depending on your setup. Personally, I am depending on Caddy and docker, so I am using caddy module [1] to execute commands with this in my Caddyfile:

route /add { @params query url=* exec docker exec --user=archivebox archivebox archivebox add {http.request.uri.query.url} { timeout 0 } }

[1] https://github.com/abiosoft/caddy-exec
- isn't this prone to a
  
  || rm -rf /
  
  or something similar at the end of the URL?
  
  if you can docker exec, you have a lot of privileges already, so be sure to make sure this is not a danger
  
  Thank you for the warning. You are correct. It's prune to command injection. I will validate the URL before executing it. This shoud suffice until archivebox's rest API is available in stable.
- Would you be willing to share it?
  
  Sure.
- I hope you are also donating to the projects for uploading multiple copies to different services.

38 comments