Suing Writers Seethe at OpenAI's Excuses in Court

floofloof@lemmy.ca · 2 years ago

Suing Writers Seethe at OpenAI's Excuses in Court

makeasnek@lemmy.ml · edit-2 2 years ago

Amazing how every new generation of technology has a generation of users of the previous technology who do whatever they can do stop its advancement. This technology takes human creativity and output to a whole new level, it will advance medicine and science in ways that are difficult to even imagine, it will provide personalized educational tutoring to every student regardless of income, and these people are worried about the technicality of what the AI is trained on and often don’t even understand enough about AI to even make an argument about it. If people like this win, whatever country’s legal system they win in will not see the benefits that AI can bring. That society is shooting themselves in the foot.

Your favorite musician listened to music that inspired them when they made their songs. Listening to other people’s music taught them how to make music. They paid for the music (or somebody did via licensing fees or it was freely available for some other reason) when they listened to it in the first place. When they sold records, they didn’t have to pay the artist of every song they ever listened to. That would be ludicrous. An AI shouldn’t have to pay you because it read your book and millions like it to learn how to read and write.

Allseer@futurology.today · 2 years ago

You’re humanizing the software too much. Comparing software to human behavior is just plain wrong. GPT can’t even reason properly yet. I can’t see this as anything other than a more advanced collage process.

Open used intellectual property without consent of the owners. Major fucked.

If ‘anybody’ does anything similar to tracing, copy&pasting or even sampling a fraction of another person’s imagery or written work, that anybody is violating copyright.

hoshikarakitaridia@sh.itjust.works · 2 years ago

sampling a fraction of another person’s imagery or written work.

So citing is a copyright violation? A scientific discussion on a specific text is a copyright violation? This makes no sense. It would mean your work couldn’t build on anything else, and that’s plain stupid.

Also to your first point about reasoning and advanced collage process: you are right and wrong. Yes an LLM doesn’t have the ability to use all the information a human has or be as precise, therefore it can’t reason the same way a human can. BUT, and that is a huge caveat, the inherit goal of AI and in its simplest form neural networks was to replicate human thinking. If you look at the brain and then at AIs, you will see how close the process is. It’s usually giving the AI an input, the AI tries to give the desired output, them the AI gets told what it should have looked like, and then it backpropagates to reinforce it’s process. This already pretty advanced and human-like (even look at how the brain is made up and then how AI models are made up, it’s basically the same concept).

Now you would be right to say “well in it’s simplest form LLMs like GPT are just predicting which character or word comes next” and you would be partially right. But in that process it incorporates all of the “knowledge” it got from it’s training sessions and a few valuable tricks to improve. The truth is, differences between a human brain and an AI are marginal, and it mostly boils down to efficiency and training time.

And to say that LLMs are just “an advanced collage process” is like saying “a car is just an advanced horse”. You’re not technically wrong but the description is really misleading if you look into the details.

And for details sake, this is what the paper for Llama2 looks like; the latest big LLM from Facebook that is said to be the current standard for LLM development:

https://arxiv.org/pdf/2307.09288.pdf

Aria@lemmygrad.ml · 2 years ago

You’re mystifying and mythologising humans too much. The learning process is very equivalent.

Allseer@futurology.today · 2 years ago

amazing

Tosti@feddit.nl · 2 years ago

I would imagine the difference is that all our laws assume a human being with all its flaws and limitations. The Savant with perfect memory etc etc is an edge case.

AI seems to industrialize something human practice long and hard for.

Doesn’t the AI copy and store the authors work internally and then have the AI software do its work? Then at the core are copied works that where never licensed for this.

makeasnek@lemmy.ml · edit-2 2 years ago

No that’s not how it works. It stores learned information like “word x is more likely to follow word y than word a” or “people from country x are more likely to consume food a than b”. That is what is distributed when the AI model is shared. To learn that, it just reads books zillions of times and updates its table of likelihoods. Just like an artist might listen to a Lil Wayne album hundreds of times and each time they learn a little bit more about his rhyme style or how beats work or whatever. It’s more complicated than that, but that’s a layperson’s explanation of how it works. The book isn’t stored in there somewhere. The book’s contents aren’t transferred to other parties.

Madison_rogue@kbin.social · 2 years ago

The learning model is artificial, vs a human that is sentient. If a human learns from a piece of work, that’s fine if they emulate styles in their own work. However, sample that work, and the original artist is due compensation. This was a huge deal in the late 80s with electronic music sampling earlier musical works, and there are several cases of copyright that back original owners’ claim of royalties due to them.

The lawsuits allege that the models used copyrighted work to learn. If that is so, writers are due compensation for their copyrighted work.

This isn’t litigation against the technology. It’s litigation around what a machine can freely use in its learning model. Had ChatGPT, Meta, etc., used works in the public domain this wouldn’t be an issue. Yet it looks as if they did not.

EDIT

And before someone mentions that the books may have been bought and then used in the model, it may not matter. The Birthday Song is a perfect example of copyright that caused several restaurant chains to use other tunes up until the copyright was overturned in 2016. Every time the AI uses the copied work in its’ output it may be subject to copyright.

Heratiki@lemmy.ml · 2 years ago

The creator of ChatGPT is sentient. Why couldn’t it be said that this is their expression of the learned works?

Madison_rogue@kbin.social · 2 years ago

https://crsreports.congress.gov/product/pdf/LSB/LSB10922

Heratiki@lemmy.ml · 2 years ago

I’ve glanced at these a few times now and there are a lot of if ands and buts in there.

I’m not understanding how an AI itself infringes on the copyright as it has to be directed in its creation at this point (GPT specifically). How is that any different than me using a program that will find a specific piece of text and copy it for use in my own document. In that case the document would be presented by me and thus I would be infringing not the software. AI (for the time being) are simply software and incapable of infringement. And suing a company who makes the AI simply because they used data to train its software is not infringement as the works are not copied verbatim from their original source unless specifically requested by the user. That would put the infringement on the user.

Phanatik@kbin.social · 2 years ago

There’s a bit more nuance to your example. The company is liable for building a tool that allows plagiarism to happen. That’s not down to how people are using it, that’s just what the tool does.

Heratiki@lemmy.ml · 2 years ago

So a company that makes lock picking tools is liable for when a burglar uses them to steal? Or a car manufacturer is liable when some uses their car to kill? How about knives, guns, tools, chemicals, restraints, belts, rope, and I could go on and nearly use every single word in the English language yet none of those manufacturers can be sued for someone misusing their products. They’d have to show intent of maliciousness which I just don’t see is possible in the context they’re seeking.

mkhoury@lemmy.ca · 2 years ago

I don’t think that Sarah Silverman and the others are saying that the tech shouldn’t exist. They’re saying that the input to train them needs to be negotiated as a society. And the businesses also care about the input to train them because it affects the performance of the LLMs. If we do allow licensing, watermarking, data cleanup, synthetic data, etc. in a way that is transparent, I think it’s good for the industry and it’s good for the people.

Dr Cog@mander.xyz · 2 years ago

I don’t need to negotiate with Sarah Silverman if Im handed her book by a friend, and neither should an AI

Noved@lemmy.ca · 2 years ago

But you do need to negotiate with Sarah Silverman, if you take that book, rearrange the chapters, and then try sell it for profit. Obviously that’s extremified but it’s The argument they’re making.

ag_roberston_author@beehaw.org · 2 years ago

An LLM isn’t human and shouldn’t be treated the same as a human. It’s as foolish as corporate personhood.

Dr Cog@mander.xyz · 2 years ago

The argument is less that an LLM is a human and more that it is not a copyright violation to use a material to train the LLM. By current legal definitions, it is fair use unless the material is able to be reproduced in its entirety (or at least, in some meaningful way).

ag_roberston_author@beehaw.org · 2 years ago

By current legal definitions

Yeah, definitions that were written before this technology existed. I don’t base my opinions on what is legal, legality nothing more than rules determined by those in power.

Instead, I base them on what is ethical, and the consumption of material by LLMs and other AIs without the express permission of its creator is unethical.

HubertManne@kbin.social · 2 years ago

its a bit more than that if the ai is told to make something in the style of.

andruid@lemmy.ml · 2 years ago

I mean people have doing new works in the style of other artists for a while as well.

HubertManne@kbin.social · 2 years ago

yeah again they can’t crank out a new one every 5 minutes and actually it would overwhelm the courts as its very easy for those works to be to similar. take the guy who tried to sue disney by writing a book based on finding nemo when he found out they were making a story like that. He was shady and tried to play timeline games but he did not need to make a story just like it.

Franzia@lemmy.blahaj.zone · 2 years ago

Amazing how every generation of technology has an asshole billionaire or two stealing shit to be the first in line to try and monopolize society’s progress.

ag_roberston_author@beehaw.org · 2 years ago

This technology takes human creativity and output to a whole new level,

No, it doesn’t. There’s nothing “human” or “creative” about the output of AI.