Id like to put my lab servers to work archiving US federal data thats likely to get pulled - climate and biomed data seems mostly likely. The most obvious strategy to me seems like setting up mirror torrents on academictorrents. Anyone compiling a list of at-risk data yet?
EDIT: Added script. Note that the script doesn't include archiving to
archivebox, since its API isn't available in stable verison yet. You can add a
function depending on your setup. Personally, I am depending on Caddy and
docker, so I am using caddy module [1] to execute commands with this in my Caddyfile:
Thank you for the warning. You are correct. It's prune to command injection. I will validate the URL before executing it. This shoud suffice until archivebox's rest API is available in stable.