Good luck breaking down people's doors for scanning their own physical books for their personal use when analog media has no DRM and can't phone home, and paper books are an analog medium.
That would be like kicking down people's doors for needle-dropping their LPs to FLAC for their own use and to preserve the physical records as vinyl wears down every time it's played back.
What’s new: The Northern District of California has granted a summary judgment for Anthropic that the training use of the copyrighted books and the print-to-digital format change were both “fair use” (full order below box). However, the court also found that the pirated library copies that Anthropic collected could not be deemed as training copies, and therefore, the use of this material was not “fair”. The court also announced that it will have a trial on the pirated copies and any resulting damages, adding:
“That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages.”
Ok so you can buy books scan them or ebooks and use for AI training but you can't just download priated books from internet to train AI. Did I understood that correctly ?
It's extremely frustrating to read this comment thread because it's obvious that so many of you didn't actually read the article, or even half-skim the article, or even attempted to even comprehend the title of the article for more than a second.
Unpopular opinion but I don't see how it could have been different.
There's no way the west would give AI lead to China which has no desire or framework to ever accept this.
Believe it or not but transformers are actually learning by current definitions and not regurgitating a direct copy. It's transformative work - it's even in the name.
This is actually good as it prevents market moat for super rich corporations only which could afford the expensive training datasets.
This is an absolute win for everyone involved other than copyright hoarders and mega corporations.
Anthropic warned against “[t]he prospect of ruinous statutory damages—$150,000 times 5 million books”: that would mean $750 billion.
So part of their argument is actually that they stole so much that it would be impossible for them/anyone to pay restitution, therefore we should just let them off the hook.
I think this means we can make a torrent client with a built in function that uses 0.1% of 1 CPU core to train an ML model on anything you download. You can download anything legally with it then. 👌
It's pretty simple as I see it. You treat AI like a person. A person needs to go through legal channels to consume material, so piracy for AI training is as illegal as it would be for personal consumption. Consuming legally possessed copywritten material for "inspiration" or "study" is also fine for a person, so it is fine for AI training as well. Commercializing derivative works that infringes on copyright is illegal for a person, so it should be illegal for an AI as well. All produced materials, even those inspired by another piece of media, are permissible if not monetized, otherwise they need to be suitably transformative. That line can be hard to draw even when AI is not involved, but that is the legal standard for people, so it should be for AI as well. If I browse through Deviant Art and learn to draw similarly my favorite artists from their publically viewable works, and make a legally distinct cartoon mouse by hand in a style that is similar to someone else's and then I sell prints of that work, that is legal. The same should be the case for AI.
But! Scrutiny for AI should be much stricter given the inherent lack of true transformative creativity. And any AI that has used pirated materials should be penalized either by massive fines or by wiping their training and starting over with legally licensed or purchased or otherwise public domain materials only.
That almost sounds right, doesn't it? If you want 5 million books, you can't just steal/pirate them, you need to buy 5 million copies. I'm glad the court ruled that way.
I feel that's a good start. Now we need some more clear regulation on what fair use is and what transformative work is and what isn't. And how that relates to AI. I believe as it's quite a disruptive and profitable business, we should maybe make those companies pay some extra. Not just what I pay for a book. But the first part, that "stealing" can't be "fair" is settled now.
Books are inherently an artificial construct.
If I read the books I train the A(rtificially trained)Intelligence in my skull.
Therefore the concept of me getting them through "piracy" is null and void...