• AHamSandwich@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    ·
    3 hours ago

    My experience is exactly the book reproduction part when I used a LLM to help with a narcissist. I had no choice but to deal with them, for months. I was stressed and panicking from fending off constant attacks while undoing years of conditioning and abuse, raising two young kids, working, seeing a therapist, seeing a lawyer, and cramming self-help and communication books.

    My lawyer suggested it. It picked the manipulation hidden in ‘reasonable’ requests and phrased things safely for me while I learned to do it myself. It did a good job, still better than I do. I was reading a few self help books and began noticing familiar phrases. It was reproducing sentences, sometimes closely paraphrased, sometimes verbatim. Like, a lot.

    I feel a little bad because I probably used a bunch of water, but I got full custody. They had been doing a great job making me look like the crazy one until they started flipping out from me being calm and professional in response to their abuse instead of panicking and flipping out myself or shutting down. Sorry planet, thanks for the water, will do better next time.

  • Dogiedog64@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    6 hours ago

    Unfortunately, they will learn the wrong lessons from this fact; instead if trying a new approach, they’ll try to scale it more and hope that “”“fixes”“” it (it won’t). Then, they’ll go crying and screaming and shitting their pants to the VC funds, demanding more funding for “”“Inference”“”, only to immediately light it on fire when they get it, because this technology is so incredibly wasteful it literally doesn’t make any sense to pursue unless you already make hundreds of billions of dollars a year in revenue.

    This is how the US Economy is structured now. Everything hinges on the 7 stocks that facilitate this. We are soon to be over a TRILLION DOLLARS into an infrastructure buildout for datacenters that incinerate money faster than light, for a technology that never worked and never will.

    • Technus@lemmy.zip
      link
      fedilink
      English
      arrow-up
      23
      arrow-down
      1
      ·
      11 hours ago

      It’s glorified autocorrect (/predictive text).

      People fight me on this every time I say it but it’s literally doing the same thing just with much further lookbehind.

      In fact, there’s probably a paper to be written about how LLMs are just lossily compressed Markov chains.

    • CosmoNova@lemmy.world
      link
      fedilink
      English
      arrow-up
      10
      arrow-down
      1
      ·
      12 hours ago

      That‘s what I keep arguing for years. It‘s not so different from printing out frames of a movie, then scanning them again and claim it‘s a completely new art piece. Everything has been altered so much it‘s completely different. However it‘s still very much recognizable with extremely little personal expression involved.

      Oh, but you chose the paper and the printer, so it‘s definitely your completely unique work, right? No, of course not.

      AI works pretty much the same. You can tell what protected material the LLM was fed by the output of a given prompt. The theft already happened when the model was trained and it‘s not that hard to prove, really.

      AI companies get away with the biggest heist in human history by being overwhelming, not by being something completely new and unregulated. Those things are already regulated but being ignored. They have big tech and therefore politics to back them up, but definitely not the written law in any country that protects intellectual property.

    • LadyAutumn@lemmy.blahaj.zone
      link
      fedilink
      English
      arrow-up
      17
      ·
      15 hours ago

      Kinda. But like, a compression algorithm that isnt all that good at exact decompression. It’s really good at outputting text that makes you think “wow that sounds pretty similar to what a person might write”. So even if it’s entirely wrong about something thats fine, as long as youd look at it and be satisfied its answer sounded right.

      • leftzero@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        10
        arrow-down
        1
        ·
        14 hours ago

        It stores the shape of the information, not the information itself.

        Which might be useful from a statistics and analytics viewpoint, but isn’t very practical as an information storage mechanism.

        • TheBlackLounge@lemmy.zip
          link
          fedilink
          English
          arrow-up
          7
          arrow-down
          1
          ·
          11 hours ago

          As you can learn from reading the article, they do also store the information itself.

          They learn and store a compression algorithm that fits the data, then use it to store that data. The former part of this is not new, AI and compression theory go back decades. What’s new and surprising is that you can get the original work out of attention transformers. Even in traditional overfit models that isn’t a given. And attention transformers shine at generality, so it’s not evident that they should do this, but all models tested do it, so maybe it is even necessary?

          Storing data isn’t a theoretical failure, some very useful AI algorithms do it by design. It’s a legal and ethical failure because openai etc have been claiming from the beginning that this isn’t happening, and it also provides proof of the pirated work it’s been trained on.

          • leftzero@lemmy.dbzer0.com
            link
            fedilink
            English
            arrow-up
            4
            ·
            edit-2
            10 hours ago

            The images on the article clearly show that they’re not storing the data, they’re storing enough information about the data to reconstruct a rough and mostly useless approximation of the data (and they do so in such a way that the information about one piece of data can be combined with the information about another one to produce another rough and mostly useless approximation of a combination of those two pieces of data, which was not in the original dataset).

            It’s like playing a telephone game with a description of an image, with the last person drawing the result.

            The legal and ethical failure is in commercially using artists’ works (as a training model) without permission, not in storing or even reproducing them, since the slop they produce is evidently an approximation and not the real thing.

            • TheBlackLounge@lemmy.zip
              link
              fedilink
              English
              arrow-up
              3
              arrow-down
              1
              ·
              edit-2
              8 hours ago

              The law disagrees. Compression has never been a valid argument. A crunchy 360p rip of a movie is a mostly useless approximation but sharing it is definitely illegal.

              Fun fact, you can use mpeg for a very decent perceptual image comparison algorithm (eg for facial recognition) , by using the file size of a two-frame video. This works mostly for the same theoretical reasons as neural network based methods. Of course, mpeg was built by humans using legally obtained videos for evaluation, but it does so without being able to reproduce any of those at all. So that’s not a requirement for compression.

    • Prove_your_argument@piefed.social
      link
      fedilink
      English
      arrow-up
      15
      arrow-down
      23
      ·
      edit-2
      19 hours ago

      Better search results than google though.

      EDIT: DO NOT LOOK AT THE CHAT. LOOK AT THE SOURCE LINKS THEY GIVE YOU.

      Unless it’s a handful of official pages or discussion forums… google is practically unusable for me now. It absolutely exploded once chatgpt came to the scene and SEO has gotten so perfected that slop is almost all the results you get.

      I wish we had some kind of downvote or report system to remove all the slop, but the more clicks the more revenue from referrals… better to make people click more.

      Almost all recipe sites now give me “We see you’re using an adblocker!” until I turn on reader mode on my phone now too. Pretty soon that will be appropriately blocked and I guess i’ll go back to cook books or something?

      • leftzero@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        6
        ·
        edit-2
        14 hours ago

        Because they intentionally broke the search engines in order to make LLMs look better.

        Search engines used to produce much more useful results than LLMs ever will (even excluding the ones they make up), before google and microsoft started pushing this garbage.

        • Prove_your_argument@piefed.social
          link
          fedilink
          English
          arrow-up
          3
          ·
          6 hours ago

          I don’t think so man. They were just started to mass train on SEO in the ad industry 15 years ago. Now it’s practically a household term.

          Google had very different goals and ambitions early on. Things changed. Now they’re like any other giant soulless corpo. Their goal is revenue, which as their platform has grown we’ve seen a metamorphosis from a focus on interesting things to a never ending professional jester troupe on every endpoint still making pennies on the dollar compared to what google earns from advertisers. They’re the middle men and should be making next to nothing since they produce next to nothing of value, but advertising sells.

          Google Search revenue was something like 175bn in 2024. A tiny fraction of that is paid out to websites from clicks. Someone with a proper LLM tuned for SEO can churn out hot garbage nonstop and fill up results in perpetuity with a guaranteed revenue stream far in excess of what is possible as a worker in something in the overwhelming majority of the world. There’s just more garbage than everywhere… and people have found exactly the right formula to rise to the top despite human-useless content. Google doesn’t think of it as a bad thing, 175bn in revenue from search! lol

        • fristislurper@piefed.social
          link
          fedilink
          English
          arrow-up
          4
          ·
          8 hours ago

          Nahh, even before LLMs became big, search results were becoming worse and worse because of all the SEO-spam.

          Now you can generate coherent-sounding articles using LLMs, making the amount of trash even bigger.

          Google making their search worse would be dumb, since all these LLMs also rely on it to some degree.

      • Honse@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        28
        arrow-down
        5
        ·
        19 hours ago

        No TF its not. The AI can only output hallucinations that are most statistically likely. There’s no way to sort the bad answers from the good. Google at least supplies a wide range of content to sort through to find the best result.

        • Prove_your_argument@piefed.social
          link
          fedilink
          English
          arrow-up
          18
          arrow-down
          9
          ·
          19 hours ago

          You misunderstand. I’m not saying AI’s chats are better than a quality article. I’m saying their search results are often better.

          Don’t look at what they SAY. Look at the links they provide as sources. Many are bad, but I find I get much better info. If I try google I might try 10+ links before I get one that really says what I want. If I try a chatbot I typically get a link that is relevant within one or two clicks.

          There is no shortcut for intelligence… but AI “SEO” has not been perfected yet .

          • Honse@lemmy.dbzer0.com
            link
            fedilink
            English
            arrow-up
            1
            ·
            edit-2
            1 hour ago

            Interesting. LLMs have no ability to directly do anything put output text so the tooling around the LLM is what’s actually searching. They probably use some API from bing or something, have you compared results with those from bing because I’d be interested to see how similar they are or how much extra tooling is used for search. I can’t imagine they want to use a lot of cycles generating only like 3 search queries per request, unless they have a smaller dedicated model for that. Would be interested to see the architecture behind it and what’s different from normal search engines.

          • gustofwind@lemmy.world
            link
            fedilink
            English
            arrow-up
            10
            arrow-down
            1
            ·
            18 hours ago

            Yep this has happened to me too

            I used to always get the results I was looking for now it’s just pure garbage but Gemini will have all the expected results as sources

            Obviously deliberate to force us to use Gemini and make free searches useless. Maybe it hasn’t been rolled out to everyone yet but it’s certainly got us

            • AmbitiousProcess (they/them)@piefed.social
              link
              fedilink
              English
              arrow-up
              3
              ·
              14 hours ago

              I’m honestly not even sure it’s deliberate.

              If you give a probability guessing machine like LLMs the ability to review content, it’s probably just gonna be more likely to rank things as you expect for your search specifically than an algorithm made to extremely quickly pull the most relevant links… based on only some of the page as keywords, with no understanding of how the context of your search relates to each page.

              The downside is, of course, that LLMs use way more energy than regular search algorithms, take longer to provide all their citations, etc.

              • Prove_your_argument@piefed.social
                link
                fedilink
                English
                arrow-up
                1
                ·
                6 hours ago

                A ton of factors have increased energy costs on the web over the years. It’s insignificant per person, but bandwidth is exponentially higher because all websites have miles of crud formatting code nowadays. Memory usage is out of control. Transmission and storage of all the metadata your web browser provides in realtime as you move your mouse around a page is infinitely higher than what we had in the early days of the web.

                The energy cost of ML will reduce as chips progress, but I think the financial reality will come crashing down on the AI industry sooner rather than later and basically keep it out of reach for most people anyway due to cost. I don’t see much of ROI for AI. Right now it’s treated as a capital investment which helps inflate company worth while it lasts, but after a few years the investment is worthless and a giant money sink from energy costs if used.

        • Prove_your_argument@piefed.social
          link
          fedilink
          English
          arrow-up
          1
          ·
          6 hours ago

          Genie is out of the bottle forever unfortunately.

          They could have instituted a system of genuine authentic reviewers who manually curate content so that your search is great, but that would result in less clicks. Less clicks means less revenue. They’re financially incentivized to make you click as much as you’re willing to.

  • peopleproblems@lemmy.world
    link
    fedilink
    English
    arrow-up
    25
    arrow-down
    1
    ·
    19 hours ago

    Ah, so they are manufacturing a reason for the bubble to collapse: “Look guys we threw too much resources at it and it broke itself not our fault”

  • Riskable@programming.dev
    link
    fedilink
    English
    arrow-up
    32
    arrow-down
    26
    ·
    18 hours ago

    but we can reasonably assume that Stable Diffusion can render the image on the right partly because it has stored visual elements from the image on the left.

    No, you cannot reasonably assume that. It absolutely did not store the visual elements. What it did, was store some floating point values related to some keywords that the source image had pre-classified. When training, it will increase or decrease those floating point values a small amount when it encounters further images that use those same keywords.

    What the examples demonstrate is a lack of diversity in the training set for those very specific keywords. There’s a reason why they chose Stable Diffusion 1.4 and not Stable Diffusion 2.0 (or later versions)… Because they drastically improved the model after that. These sorts of problems (with not-diverse-enough training data) are considered flaws by the very AI researchers creating the models. It’s exactly the type of thing they don’t want to happen!

    The article seems to be implying that this is a common problem that happens constantly and that the companies creating these AI models just don’t give a fuck. This is false. It’s flaws like this that leave your model open to attack (and letting competitors figure out your weights; not that it matters with Stable Diffusion since that version is open source), not just copyright lawsuits!

    Here’s the part I don’t get: Clearly nobody is distributing copyrighted images by asking AI to do its best to recreate them. When you do this, you end up with severely shitty hack images that nobody wants to look at. Basically, if no one is actually using these images except to say, “aha! My academic research uncovered this tiny flaw in your model that represents an obscure area of AI research!” why TF should anyone care?

    They shouldn’t! The only reason why articles like this get any attention at all is because it’s rage bait for AI haters. People who severely hate generative AI will grasp at anything to justify their position. Why? I don’t get it. If you don’t like it, just say you don’t like it! Why do you need to point to absolutely, ridiculously obscure shit like finding a flaw in Stable Diffusion 1.4 (from years ago, before 99% of the world had even heard of generative image AI)?

    Generative AI is just the latest way of giving instructions to computers. That’s it! That’s all it is.

    Nobody gave a shit about this kind of thing when Star Trek was pretending to do generative AI in the Holodeck. Now that we’ve got he pre-alpha version of that very thing, a lot of extremely vocal haters are freaking TF out.

    Do you want the cool shit from Star Trek’s imaginary future or not? This is literally what computer scientists have been dreaming of for decades. It’s here! Have some fun with it!

    Generative AI uses up less power/water than streaming YouTube or Netflix (yes, it’s true). So if you’re about to say it’s bad for the environment, I expect you’re just as vocal about streaming video, yeah?

    • OctopusNemeses@lemmy.world
      link
      fedilink
      English
      arrow-up
      13
      arrow-down
      4
      ·
      11 hours ago

      Do you want the cool shit from Star Trek’s imaginary future or not?

      You lost me there. Conflating a fictional future utopia with the product your trying to sell is cheap trick.

      Anyone who uses this bad faith tactic loses all credibility. Post read and disregarded.

    • melfie@lemy.lol
      link
      fedilink
      English
      arrow-up
      2
      ·
      8 hours ago

      Agreed, making demonstrably false or misleading arguments in an attempt to discredit something you disagree with is always a bad idea. The recent generative models are cool tech, it’s just that their benefits and potential to improve in the future are significantly overhyped due to perverse financial incentives. They’re still useful tools, they’re just not as Earth shattering as the tech bros want everyone to believe.

    • AmbitiousProcess (they/them)@piefed.social
      link
      fedilink
      English
      arrow-up
      8
      arrow-down
      4
      ·
      13 hours ago

      The article seems to be implying that this is a common problem that happens constantly and that the companies creating these AI models just don’t give a fuck.

      Not only does the article not once state that this is a common problem, only explaining the technical details of how it works, and the possible legal ramifications of it, but they mention how, according to nearly any AI scholar/expert you can talk to, this is not some fixable problem. If you take data, and effectively do extremely lossy compression on it, there is still a way for that data to theoretically be recovered.

      Advancing LLMs while claiming you’ll work on it doing this doesn’t change the fact that this is a problem inherent to LLMs. There are certainly ways to prevent it, reduce its likelihood, etc, but you can’t entirely remove the problem. The article is simply about how LLMs inherently memorize data, and while you can mask it with more varied training data, you still can’t avoid the fact that trained weights memorize inputs, and when combined together, can eventually reproduce those inputs.

      To be very clear, again, I’m not saying it’s impossible to make this happen less, but it’s still an inherent part of how LLMs work, and isn’t some entirely fixable problem. Is it better now than it used to be? Sure. Is it fully fixable? Never.

      Clearly nobody is distributing copyrighted images by asking AI to do its best to recreate them. When you do this, you end up with severely shitty hack images that nobody wants to look at

      It’s actually a major problem for artists where people will pass their art through an AI model to reimagine it slightly differently so it can’t be copyright striked, but will still retain some of the more human choices, design elements, and overall composition.

      Spend any amount of time on social platforms with artists and you’ll find many of them now don’t complain as much about people directly stealing their art and reposting it, but more people stealing their images and changing them a bit with AI, then reposting it so it’s just different enough they can feign innocence and tell their followers it’s all their work.

      Basically, if no one is actually using these images except to say, “aha! My academic research uncovered this tiny flaw in your model that represents an obscure area of AI research!” why TF should anyone care?

      The thing is, while these are isolated experiments meant to test for these behaviors as quickly as possible with a small set of researchers, when you look at the sheer scale of people using AI tools now, then statistically speaking, you will inevitably get people who put in a prompt that is similar enough to a work that was trained on, and it will output something almost identical to that work, without the prompter realizing.

      Why do you need to point to absolutely, ridiculously obscure shit like finding a flaw in Stable Diffusion 1.4 (from years ago, before 99% of the world had even heard of generative image AI)?

      Because they highlight the flaws that continue to plague existing models, but have been around for long enough that you can run long-term tests, run them more cheaply on current AI hardware at scale, and can repeat tests with the same conditions rather than starting over again every single time a new model is released.

      Again, this memorization is inherent to how these AI models are trained, it gets better with new releases as more training data is used, and more alterations are made, but it cannot be removed, because removing the memorization removes all the training.

      I’ll admit it’s less of a “smoking gun” against use of AI in itself than it used to be when the issue was more prevalent, but acting like it’s a non-issue isn’t right either.

      Generative AI is just the latest way of giving instructions to computers. That’s it! That’s all it is.

      It is not, unless you consider every single piece of software or code ever to be just “a way of giving instructions to computers” since code is just instructions for how a computer should operate, regardless of the actual tangible outcomes of those base-level instructions.

      Generative AI is a type of computation that predicts the most likely sequence of text, or distribution of pixels in an image. That is all it is. It can be used to predict the most likely text, in a machine readable format, which can then control a computer, but that is not what it inherently is in its entirety.

      It can also rip off artists and journalists, hallucinate plausible misinformation about current events, or delude you into believing you’re the smartest baby of 1996.

      It’s like saying a kitchen knife is just a way to cut foods… when it can also be used to stab someone, make crafts, or open your packages. It can be “just a way of altering the size and quantity of pieces of food”, but it can also be a murder weapon or a letter opener.

      Nobody gave a shit about this kind of thing when Star Trek was pretending to do generative AI in the Holodeck

      That would be because it was a fictional series about a nonexistent future that didn’t affect anyone’s life today in a negative way if nonexistent job roles were replaced, and most people didn’t have to think about how it would affect them if it became reality today.

      Do you want the cool shit from Star Trek’s imaginary future or not? This is literally what computer scientists have been dreaming of for decades. It’s here! Have some fun with it!

      People also want flying cars without thinking of the noise pollution and traffic management. Fiction isn’t always what people think it could be.

      Generative AI uses up less power/water than streaming YouTube or Netflix

      But Generative AI is not replacing YouTube or Netflix, it’s primarily replacing web searches. So when someone goes to ChatGPT instead of Google, that uses anywhere from a few tens of times more energy to a couple hundreds more.

      Yet they will still also use Netflix on top of that.

      I expect you’re just as vocal about streaming video, yeah?

      People generally aren’t, because streaming video tends to have a much more positive effect on their lives than AI.

      Watching a new show or movie is fun and relaxing. If it isn’t, you just… stop watching. Nobody forces it down your throat.

      Having LLMs pollute my search results with plausible sounding nonsense, and displace the jobs of artists I enjoy the art of, is not fun, nor relaxing. Talking with someone on social media just to find out they aren’t even a real human is annoying. Trying to troubleshoot an issue and finding made up solutions makes my problem even harder to solve.

      We can’t necessarily all be focusing on every single possible thing that takes energy, but it’s easy to focus on the thing that most people have an overall negative association with the effects of.

      Two birds, one stone.

      • VoterFrog@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        ·
        7 hours ago

        If you take data, and effectively do extremely lossy compression on it, there is still a way for that data to theoretically be recovered.

        This is extremely wrong and your entire argument rests on this single sentence’s accuracy so I’m going to focus on it.

        It’s very, very easy to do a lossy compression on some data and wind up with something unrecognizable. Actual lossy compression algorithms are a tight balancing act of trying to get rid of just the right amount of just the right pieces of data so that the result is still satisfactory.

        LLMs are designed with no such restriction. And any single entry in a large data set is both theoretically and mathematically unrecoverable. The only way that these large models reproduce anything is due to heavy replication in the data set such that, essentially, enough of the “compressed” data makes it through. There’s a reason why whenever you read about this the examples are very culturally significant.

      • AwesomeLowlander@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        1
        ·
        9 hours ago

        Please see my other comment about energy / water usage. Aside from that, I’m not disputing your other points.

        Relevant except:

        ChatGPT is bad relative to other things we do (it’s ten times as bad as a Google search)

        If you multiply an extremely small value by 10, it can still be so small that it shouldn’t factor into your decisions.

        If you were being billed $0.0005 per month for energy for an activity, and then suddenly it began to cost $0.005 per month, how much would that change your plans?

        A digital clock uses one million times more power (1W) than an analog watch (1µW). “Using a digital clock instead of a watch is one million times as harmful to the climate” is correct, but misleading. The energy digital clocks use rounds to zero compared to travel, food, and heat and air conditioning. Climate guilt about digital clocks would be misplaced.

        The relationship between Google and ChatGPT is similar to watches and clocks. One uses more energy than the other, but both round to zero.

        When was the last time you heard a climate scientist say we should avoid using Google for the environment? This would sound strange. It would sound strange if I said “Ugh, my friend did over 100 Google searches today. She clearly doesn’t care about the climate.” Google doesn’t add to our energy budget at all. Assuming a Google search uses 0.03 Wh, it would take 300,000 Google searches to increase your monthly energy use by 1%. It would be a sad meaningless distraction for people who care about the climate to freak out about how often they use Google search. Imagine what your reaction would be to someone telling you they did ten Google searches. You should have the same reaction to someone telling you they prompted ChatGPT.

        What matters for your individual carbon budget is total emissions. Increasing the emissions of a specific activity by 10 times is only bad if that meaningfully contributes to your total emissions. If the original value is extremely small, this doesn’t matter.

        It’s as if you were trying to save money and had a few options for where to cut:

        You buy a gum ball once a month for $0.01. Suddenly their price jumps to $0.10 per gum ball.
        
        You have a fancy meal out for $50 once a week to keep up with a friend. The restaurant host likes you because you come so often, so she lowers the price to $40.
        

        It’s very unlikely that spending an additional $0.10 per month is ever going to matter for your budget. Spending any mental energy on the gum ball is going to be a waste of time for your budget, even though its cost was multiplied by 10. The meal out is making a sizable dent in your budget. Even though it decreased in cost, cutting that meal and finding something different to do with your friend is important if you’re trying to save money. What matters is the total money spent and the value you got for it, not how much individual activities increased or decreased relative to some other arbitrary point.

        Google and ChatGPT are like the gum ball. If a friend were worried about their finances, but spent any time talking about foregoing a gum ball each month, you would correctly say they had been distracted by a cost that rounds to zero. You should say the same to friends worried about ChatGPT. They should be able to enjoy something that’s very close to free. What matters for the climate is the total energy we use, just like what matters for our budget is how much we spend in total. The climate doesn’t react to hyper specific categories of activities, like search or AI prompts.

        If you’re an average American, each ChatGPT prompt increases your daily energy use (not including the energy you use in your car) by 0.001%. It takes about 1,000 ChatGPT prompts to increase your daily energy use by 1%. If you did 1,000 ChatGPT prompts in 1 day and feel bad about the increased energy, you could remove an equal amount of energy from your daily use by:

        Running a clothes drier for 6 fewer minutes.
        
        Running an air conditioner for 18 fewer minutes. 
        
      • Zos_Kia@lemmynsfw.com
        link
        fedilink
        English
        arrow-up
        1
        ·
        10 hours ago

        That scenario where artists get their shit stolen by passing it through AIGen to avoid copyright strikes is hilarious to me. I’d love to see examples of that cause I can’t really picture it.

    • AwesomeLowlander@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      2
      ·
      edit-2
      10 hours ago

      Edit: It’s interesting how this snippet always gets downvoted without explanation. Let’s not be like the crazies. Acknowledge the facts even if you don’t like the technology.

      Source for the claim on using less water than YouTube or Netflix (or even walking, for that matter)

      Using chatbots emits the same tiny amounts of CO2 as other normal things we do online, and way less than most offline things we do. Even when you include “hidden costs” like training, the emissions from making hardware, energy used in cooling, and AI chips idling between prompts, the carbon cost of an average chatbot prompt adds up to less than 1/150,000th of the average American’s daily emissions. Water is similar. Everything we do uses a lot of water. Most electricity is generated using water, and most of the way AI “uses” water is actually just in generating its electricity. The average American’s daily water footprint is ~800,000 times as much as the full cost of an AI prompt. The actual amount of water used per prompt in data centers themselves is vanishingly small.

      Because chatbot prompts use so little energy and water, if you’re sitting and reading the full responses they generate, it’s very likely that you’re using way less energy and water than you otherwise would in your daily life. It takes ~1000 prompts to raise your emissions by 1%. If you sat at your computer all day, sending and reading 1000 prompts in a row, you wouldn’t be doing more energy intensive things like driving, or using physical objects you own that wear out, need to be replaced, and cost emissions and water to make. Every second you spend walking outside wears out your sneakers just a little bit, to the point that they eventually need to be replaced. Sneakers cost water to make. My best guess is that every second of walking uses as much water in expectation as ~7 chatbot prompts. So sitting inside at your computer saves that water too. It seems like it’s near impossible to raise your personal emissions and water footprint at all using chatbots, because using all day on something that ends up causing 1% of your normal emissions is exactly like spending all day on an activity that costs only 1% of the money you normally spend.

      There are no other situations, anywhere, where we worry about amounts of energy and water this small. I can’t find any other places where people have gotten worried about things they do that use such tiny amounts of energy. Chatbot energy and water use being a problem is a really bizarre meme that has taken hold, I think mostly because people are surprised that chatbots are being used by so many people that on net their total energy and water use is noticeable. Being “mindful” with your chatbot usage is kind of like filling a large pot of water to boil to make food, and before boiling it, taking a pipet and removing tiny drops of the water from the pot at a time to “only use the water you need” or stopping your shower a tenth of a second early for the sake of the climate. You do not need to be “mindful” with your chatbot usage for the same reason you don’t need to be “mindful” about those additional droplets of water you boil.

      • tree_frog_and_rain@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        6 hours ago

        The usage metrics for energy that support his argument came from Sam Altman. And this is also terrible reasoning in regard to water usage, because it doesn’t really matter that the prompts are only 0.3ml. Since the rest comes from generating a response. Also, it’s 0.5 -1.5l of water not 2 ml. Because again he’s using AI tech oligarchs as his source, in this case, Google. So the water usage is minimized by several orders of magnitude.

        However, that 2 mL of water is mostly the water used in the normal power plants the data center draws from. The prompt itself only uses about 0.3 mL, so if you’re mainly worried about the water data centers use per prompt, you use about 300,000 times as much every day in your normal life.

        https://www.profolus.com/topics/ai-water-consumption-2025-rivals-global-bottled-water-demand/

        • AwesomeLowlander@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          6 hours ago

          Just read through your link and the journal it uses as a source. While the journal seems fine, the article itself makes claims that are not backed up by the journal and does not seem to cite any other sources for those claims. For instance, the claim that LLMs use 1.5L of water per 100 word reply seems to have been pulled out of thin air.

          • tree_frog_and_rain@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            edit-2
            5 hours ago

            I will take a look at the original article.

            But again I’m going to restate that the article you posted uses tech oligarchs as primary sources. Which just on the face of it looks like green washing.

            Edit: I read the paper. Yeah the conclusion is tech is seriously underestimating water and carbon footprint. We don’t have exact figures because they don’t disclose them. But from what we can gather the information they are giving us substantially down plays the environmental impact.

            So, we don’t have exact figures. And acting as though we do using tech oligarch statistics is greenwashing 🤷‍♀️

            For anyone following along.

            https://www.cell.com/patterns/fulltext/S2666-3899(25)00278-8

            • AwesomeLowlander@sh.itjust.works
              link
              fedilink
              English
              arrow-up
              1
              ·
              5 hours ago

              I’ve read through the sources and links, and there is sanity checking and 3rd party input. The numbers from Google were also published in a white paper, so there’s a reasonable level of transparency and verifiability. While they shouldn’t be taken entirely at their word, there’s currently little reason to think their figures aren’t at least in the ballpark of the actual data.