Linux lays down the law on AI-generated code, says yes to Copilot, no to AI slop, and humans take the fall for mistakes — after months of fierce debate, Torvalds and maintainers come to an agreement

Lee Duna@lemmy.nz · 1 month ago

Linux lays down the law on AI-generated code, says yes to Copilot, no to AI slop, and humans take the fall for mistakes — after months of fierce debate, Torvalds and maintainers come to an agreement

FauxLiving@lemmy.world · edit-2 1 month ago

Given the research that you’ve done here I’m going to assume that you’re looking for an answer and not simply taking us on a gish gallop.

Your premise, and what appears to be the primary source of confusion, is built on the idea that this is ‘stolen’ work which, from a legal point of view, is untrue. If you want to dig into why that is, look into the precedent setting case of Authors Guild, Inc. v. Google, Inc. (2015). The TL;DR is that training AI on copyrighted works falls under the Fair Use exemptions in copyright law. i.e. It is legal, not stealing.

The case you linked from Munich shows that other country’s legal systems are interpreting AI training in the same way. Training AI isn’t about memorization and plagiarism of existing work, it’s using existing work to learn the underlying patterns.

That isn’t to say that memorization doesn’t happen, but it is more of a point of interest to AI scientists that are working on understanding how AI represents knowledge internally than a point that lands in a courtrooom.

We all memorize copyrighted data as part of our learning. You, too, can quote Disney movies or Stephen King novels if prompted in the right way. This doesn’t make any work you create automatically become plagarism, it just means that you have viewed copyrighted work as part of your learning process. In the same way, artists have the capability to create works which violate the copyright of others and they consumed copyrighted works as part of their learning process. These facts don’t taint all of their work, either morally or legally… only the output that literally violates copyright laws.

The pragmatism here is recognizing that these tools exist and that people use them. The current legal landscape is such that the output of these tools is as if they were the output of the users. If an image generator generates a copyrighted image then the rightsholder can sue the person, not the software. If a code generator generates licensed code then the tool user is responsible.

This is much like how we don’t restrict the usage of Photoshop despite the fact that it can be used to violate copyright. We, instead, put the burden on the person who operates the tool

That’s what is happening here. Linus isn’t using his position to promote/enforce/encourage LLM use, nor is he using his position to prevent/restrict/disallow any AI use at all. He is recognizing that this is a tool that exists in the world in 2026 and that his project needs to have procedures that acknowledge this while also ensuring that a human is the one responsible for their submissions.

This is the definition of pragmatism (def: action or policy dictated by consideration of the immediate practical consequences rather than by theory or dogma).

e: precedent, not president (I’m blaming the AI/autocorrect on this one)

mimavox@piefed.social · 1 month ago

Training AI isn’t about memorization and plagiarism of existing work, it’s using existing work to learn the underlying patterns.

Thank you. This is exactly what people misunderstands. LLMs aren’t gigantic databases that just shuffles information that they’ve copied from the internet.

bss03@infosec.pub · 1 month ago

The TL;DR is that training AI on copyrighted works falls under the Fair Use exemptions in copyright law

This judgement was reversed by the next federal judge that reviewed AI, in the Meta case.

It is far from legally settled whether training is fair use or not.

FauxLiving@lemmy.world · 1 month ago

Well, cynically, the Supreme Court will decide and Team AI has more money to buy RVs and luxury vacations.