A group of authors filed a lawsuit against Meta, alleging the unlawful use of copyrighted material in developing its Llama 1 and Llama 2 large language models....
“To the extent a response is deemed required, Meta denies that its use of copyrighted works to train Llama required consent, credit, or compensation,” Meta writes.
The authors further stated that, as far as their books appear in the Books3 database, they are referred to as “infringed works”. This prompted Meta to respond with yet another denial. “Meta denies that it infringed Plaintiffs’ alleged copyrights,” the company writes.
When you compare the attitudes on this and compare them to how people treated The Pirate Bay, it becomes pretty fucking clear that we live in a society with an entirely different set of rules for established corporations.
The main reason they were able to prosecute TPB admins was the claim they were making money. Arguably, they made very little, but the copyright cabal tried to prove that they were making just oodles of money off of piracy.
Meta knew that these files were pirated. Everyone did. The page where you could download Books3 literally referenced Bibliotik, the private torrent tracker where they were all downloaded. Bibliotik also provides tools to strip DRM from ebooks, something that is a DMCA violation.
This dataset contains all of bibliotik in plain .txt form, aka 197,000 books processed in exactly the same way as did for bookcorpusopen (a.k.a. books1)
They knew full well the provenance of this data, and they didn't give a flying fuck. They are making money off of what they've done with the data. How are we so willing to let Meta get away with this while we were literally willing to let US lawyers turn Swedish law upside-down to prosecute a bunch of fucking nerds with hardly any money? Probably because money.
Trump wasn't wrong, when you're famous enough, they let you do it.
You see, if you pirate a couple textbooks in college because you don't have resources, but you want to earn your right to participate in society and not starve, it's called theft.
But if one of the top 10 companies in the world does the same with thousands of books just to get even richer, it's called fair use.
I'll say this: If Meta and Facebook are prosecuted and domains seized in the same way pirate sites are, for Meta's use of illegimately obtained copyrighted material for profit, then I'll believe that anti-piracy laws are fair and just.
If Meta win this lawsuit, does it mean I can download some open source AI and claim that "These million 4k Blu-ray ISOs I torrented was just used to train my AI model"?
Heck, if how you use the downloaded stuff is a factor, I can claim that I just torrented those files and never looked at them. It is more believable than Meta's argument too, because, as a human, I do not have enough time to consume a million movies in my lifetime (probably, didn't do the math) unlike AIs.
But who am I kidding, I fully expect to be sued to hell and back if I were actually to do that.
Oh so when I pirate something I get a legal notice in my mailbox and a strike against me but when Meta does it they get rewarded with H A L L U C I N A T I O N S
The profit margins in AI are fleeting at best. There's no point in squabbling over who's paying for what training data. Very, very soon it's all going to be free anyway.
Given how LLM's work and how nearly everything of value is under a copyright until at least the old age of the creators grandchildren LLMs would probably be pretty useless if they can't disregard copyright for their purposes.
Not that I have any sympathy for the likes of Meta and OpenAI in any of this.
ITT: A hilarious combination of people who have no clue what copyright covers and people who think providing a tool that allows a user to generate potentially copy written material is a violation of the aforementioned.
Google literally does this in every image search, but go off I guess...
I love how everybody here goes from "yay piracy" and "screw copyright" to "I can't believe they violated copyright laws" the second it's somebody they dislike.