Thousands of authors demand payment from AI companies for use of copyrighted works
Thousands of authors demand payment from AI companies for use of copyrighted works

Thousands of authors demand payment from AI companies for use of copyrighted works | CNN Business

Thousands of authors demand payment from AI companies for use of copyrighted works::Thousands of published authors are requesting payment from tech companies for the use of their copyrighted works in training artificial intelligence tools, marking the latest intellectual property critique to target AI development.
There is already a business model for compensating authors: it is called buying the book. If the AI trainers are pirating books, then yeah - sue them.
There are plagiarism and copyright laws to protect the output of these tools: if the output is infringing, then sue them. However, if the output of an AI would not be considered infringing for a human, then it isn’t infringement.
When you sell a book, you don’t get to control how that book is used. You can’t tell me that I can’t quote your book (within fair use restrictions). You can’t tell me that I can’t refer to your book in a blog post. You can’t dictate who may and may not read a book. You can’t tell me that I can’t give a book to a friend. Or an enemy. Or an anarchist.
Folks, this isn’t a new problem, and it doesn’t need new laws.
It's 100% a new problem. There's established precedent for things costing different amounts depending on their intended use.
For example, buying a consumer copy of song doesn't give you the right to play that song in a stadium or a restaurant.
Training an entire AI to make potentially an infinite number of derived works from your work is 100% worthy of requiring a special agreement. This even goes beyond simple payment to consent; a climate expert might not want their work in an AI which might severely mischatacterize the conclusions, or might want to require that certain queries are regularly checked by a human, etc
My point is that the restrictions can’t go on the input, it has to go on the output - and we already have laws that govern such derivative works (or reuse / rebroadcast).
The thing is, copyright isn't really well-suited to the task, because copyright concerns itself with who gets to, well, make copies. Training an AI model isn't really making a copy of that work. It's transformative.
Should there be some kind of new model of renumeration for creators? Probably. But it should be a compulsory licensing model.
This is demonstrably wrong. You cannot buy a book, and then go use it to print your own copies for sale. You cannot use it as a script for a commercial movie. You cannot go publish a sequel to it.
Now please just try to tell me that AI training is specifically covered by fair use and satire case law. Spoiler: you can’t.
This is a novel (pun intended) problem space and deserves to be discussed and decided, like everything else. So yeah, your cavalier dismissal is cavalierly dismissed.
I completely fail to see how it wouldn't be considered transformative work
No, you misunderstand. Yes, they can control how the content in the book is used - that’s what copyright is. But they can’t control what I do with the book - I can read it, I can burn it, I can memorize it, I can throw it up on my roof.
My argument is that the is nothing wrong with training an AI with a book - that’s input for the AI, and that is indistinguishable from a human reading it.
Now what the AI does with the content - if it plagiarizes, violates fair use, plagiarizes- that’s a problem, but those problems are already covered by copyright laws. They have no more business saying what can or cannot be input into an AI than they can restrict what I can read (and learn from). They can absolutely enforce their copyright on the output of the AI just like they can if I print copies of their book.
My objection is strictly on the input side, and the output is already restricted.
It's specifically distribution of the work or derivatives that copyright prevents.
So you could make an argument that an LLM that's memorized the book and can reproduce (parts of) it upon request is infringing. But one that's merely trained on the book, but hasn't memorized it, should be fine.
I asked Bing Chat for the 10th paragraph of the first Harry Potter book, and it gave me this:
"He couldn’t know that at this very moment, people meeting in secret all over the country were holding up their glasses and saying in hushed voices: ‘To Harry Potter – the boy who lived!’"
It looks like technically I might be able to obtain the entire book (eventually) by asking Bing the right questions?
Then this is a copyright violation - it violates any standard for such, and the AI should be altered to account for that.
What I’m seeing is people complaining about content being fed into AI, and I can’t see why that should be a problem (assuming it was legally acquired or publicly available). Only the output can be problematic.
It's an algorithm that's been trained on numerous pieces of media by a company looking to make money of it. I see no reason to give them a pass on fairly paying for that media.
You can see this if you reverse the comparison, and consider what a human would do to accomplish the task in a professional setting. That's all an algorithm is. An execution of programmed tasks.
If I gave a worker a pirated link to several books and scientific papers in the field, and asked them to synthesize an overview/summary of what they read and publish it, I'd get my ass sued. I have to buy the books and the scientific papers. STEM companies regularly pay for access to papers and codes and standards. Why shouldn't an AI have to do the same?
Well, if OpenAI knowingly used pirated work, that's one thing. It seems pretty unlikely and certainly hasn't been proven anywhere.
Of course, they could have done so unknowingly. For example, if John C Pirate published the transcripts of every movie since 1980 on his website, and OpenAI merely crawled his website (in the same way Google does), it's hard to make the case that they're really at fault any more than Google would be.
If I read your book... and get an amazing idea... Turn it into a business and make billions off of it. You still have no right to anything. This is no different.
There's been no proof or evidence provided that ANY content was ever pirated. Has any of the companies even provided the dataset they've used yet?
Why is this the presumption that they did it the illegal way?
This is a little off, when you quote a book you put the name of the book you’re quoting. When you refer to a book, you, um, refer to the book?
I think the gist of these authors complaints is that a sort of “technology laundered plagiarism” is occurring.
Copyright 100% applies to the output of an AI, and it is subject to all the rules of fair use and attribution that entails.
That is very different than saying that you can’t feed legally acquired content into an AI.
That's part of the allegation, but it's unsubstantiated. It isn't entirely coherent.
It's not entirely unsubstantiated. Sarah Silverman was able to get ChatGPT to regurgitate passages of her book back to her.