Leaked list shows Facebook training their AI on multiple Lemmy instances
Leaked list shows Facebook training their AI on multiple Lemmy instances
Dropsitenews published a list of websites Facebook uses to train its AI on. Multiple Lemmy instances are on the list as noticed by user BlueAEther
Hexbear is on there too. Also Facebook is very interested in people uploading their massive dongs to lemmynsfw.
Full article here.
Link to the full leaked list download: Meta leaked list pdf
I've said this many times before, but if you operate an instance, host a TERMS OF SERVICE.
It's easy to do, and gives the option of legal action against this. Please spread the word to your site admins.
For example, from Reddit's user agreement:
https://redditinc.com/policies/user-agreement
Make them run instances that can be defederated.
But if it's a public instance and they're just scraping the public website content they haven't agreed to the terms of use and it probably doesn't have any teeth? Besides it's meta so what would one do anyway? Their lawyers will just drain your finances on court fees and continuances.
In the somewhat-distant past, "trespass to chattels" is a type of lawsuit in Anglo-American law that could be raised in response to the abuse of a publicly-accessible computer system, originally meant as a remedy for the diminishment of someone's else's property (eg milking their cow). How the modern case law is understood, it allows the owner of a system (eg a Fediverse instance) to recover money due to a tortfeasor's (eg Meta) conduct that interferes with the normal function of the system. The bar had been raised since the 80s, requiring direct impact to the system, not just that someone accessed the system without explicit authorization. Even outright malice does not suffice, since the test is whether the system was degraded in some way.
A run-of-the-mill scraper querying once daily wouldn't meet the test, and something as minimal as an ICMP ping every second wouldn't meet the test. But AI scraping to the tune of hundreds of queries per day, adding up to double digit percentage points of server bandwidth for a small Fediverse instance, that might.
That some instance operators have to consider adding more vCPUs or RAM, or operators that successfully applied blockers like Anubis, in response to AI scraping underscores how harmful -- and thus potentially legally actionable -- those actions are, suggesting a decent chance such a lawsuit could be successful.
No thanks. I'd rather instances use their money to support and improve their service than waste it figuring fucking meta over text. What a waste of money.
Your messages aren't high quality intellectual property nor have any monetary value.
If they didn't have value they wouldn't be scrapping it...