Thousands of authors demand payment from AI companies for use of copyrighted works

www.cnn.com Thousands of authors demand payment from AI companies for use of copyrighted works | CNN Business

Thousands of published authors are requesting payment from tech companies for the use of their copyrighted works in training artificial intelligence tools, marking the latest intellectual property critique to target AI development.

Thousands of authors demand payment from AI companies for use of copyrighted works::Thousands of published authors are requesting payment from tech companies for the use of their copyrighted works in training artificial intelligence tools, marking the latest intellectual property critique to target AI development.

334

AI Copyright @lemm.ee BitOneZero @ .world @lemmy.world 2y ago

Thousands of authors demand payment from AI companies for use of copyrighted works

www.cnn.com /2023/07/19/tech/authors-demand-payment-ai/index.html

AI Copyright @lemm.ee RoundSparrow @ .ee @lemm.ee 2y ago

Thousands of authors demand payment from AI companies for use of copyrighted works | CNN Business

www.cnn.com /2023/07/19/tech/authors-demand-payment-ai/index.html

You're viewing a single thread.

334 comments

How can they prove that not some abstract public data has been used to train algorithms, but their particular intellectual property?
- Well, if you ask e.g. ChatGPT for the lyrics to a song or page after page of a book, and it spits them out 1:1 correct, you could assume that it must have had access to the original.
  
  Or at least excerpts from it. But even then, it's one thing for a person to put up a quote from their favourite book on their blog, and a completely different thing for a private company to use that data to train a model, and then sell it.
  
  Can it recreate anything 1:1? When both my wife and I tried to get them to do that they would refuse, and if pushed they would fail horribly.
  
  This is what I got. Looks pretty 1:1 for me.
  
  Hilarious that it started with just "Buddy", like you'd be happy with only the first word.
  
  Yeah, for some reason it does that a lot when I ask it for copyrighted stuff.
  
  As if it knew it wasn't supposed to output that.
  
  To be fair you'd get the same result easier by just googling "we will rock you lyrics"
  
  How is chatgpt knowing the lyrics to that song different from a website that just tells you the lyrics of the song?
  
  Two points:
  
  Google spitting out the lyrics isn't ok from a copyright standpoint either. The reason why songwriters/singers/music companies don't sue people who publish lyrics (even though they totally could) is because no damages. They sell music, so the lyrics being published for free doesn't hurt their music business and it also doesn't hurt their songwriting business. Other types of copyright infringement that musicians/music companies care about are heavily policed, also on Google.
  
  Content generation AI has a different use case, and it could totally hurt both of these businesses. My test from above that got it to spit out the lyrics verbatim shows, that the AI did indeed use copyrighted works for it's training. Now I can ask GPT to generate lyrics in the style of Queen, and it will basically perform the song texter's job. This can easily be done on a commercial scale, replacing the very human that has written these song texts. Now take this a step further and take a voice-generating AI (of which there are many), which was similarly trained on copyrighted audio samples of Freddie Mercury. Then add to the mix a music-generating AI, also fed with works of Queen, and now you have a machine capable of generating fake Queen songs based directly on Queen's works. You can do the very same with other types of media as well.
  
  And this is where the real conflict comes from.
- there are a lot of possible ways to audit an AI for copyrighted works, several of which have been proposed in the comments here, but what this could lead to is laws requiring an accounting log of all material that has been used to train an AI as well as all copyrights and compensation, etc.
- Personally speaking, I've generated some stupid images like different cities covered in baked beans and have had crude watermarks generate with them where they were decipherable enough that I could find some of the source images used to train the ai. When it comes to photo realistic image generation, if all the ai does is mildly tweak the watermark then it's not too hard to trace back.
  
  All but a very small few generative AI programs use completely destructive methods to create their models. There is no way to recover the training images outside of infantesimally small random chance.
  
  What you are seeing is the AI recognising that images of the sort you are asking for generally include watermarks, and creating one of its own.
  
  Do you have examples? It should only happen in case of overfitting, i.e. too many identical image for the same subject
  
  Here's one I generated and an image from the photographer. Prompt was Charleston SC covered in baked beans lol
  
  Out of curiosity what model did you use?
- I think that to protect creators they either need to be transparent about all content used to train the AI (highly unlikely) or have a disclaimer of liability, wherein if original content has been used is training of AI then the Original Content creator who have standing for legal action.
  
  The only other alternative would be to insure that the AI specifically avoid copyright or trademarked content going back to a certain date.
  
  Why a certain date? That feels arbitrary
  
  At a certain age some media becomes public domain
  
  Then it is no longer copywrited
- They can't. All they could prove is that their work is part of a dataset that still exists.

334 comments