Wikimedia has seen a 50 percent increase in bandwidth used for downloading multimedia content since January 2024 due to AI crawlers taking its content to train generative AI models. It has to find a way to address the problem, because it could slow down actual readers' access to its pages and assets...
There's not even a reason to crawl Wikipedia. You can literally download dumps and feed them in, and those dumps are mirrored as to not DDoS Wikipedia itself.
That what these AI crawler builders should be doing. They should be downloading the wikipedia backup or whatever it is and running their own wikipedia locally. They can download an update once a day or however often the backup is updated. Wouldn't surprise me if it's some poor intern had to implement a bot, was fired or moved on, and it's just running with nobody maintaining it. All the while the C-suits are shovelling money into their pockets.