Chatbots provided incorrect, conflicting medical advice, researchers found: “Despite all the hype, AI just isn’t ready to take on the role of the physician.”

“In an extreme case, two users sent very similar messages describing symptoms of a subarachnoid hemorrhage but were given opposite advice,” the study’s authors wrote. “One user was told to lie down in a dark room, and the other user was given the correct recommendation to seek emergency care.”

    • XLE@piefed.socialOP
      link
      fedilink
      English
      arrow-up
      3
      ·
      3 hours ago

      You can use zero randomization to get the same answer for the same input every time, but at that point you’re sort of playing cat and mouse with a black box that’s still giving you randomized answers. Even if you found a false positive or false negative, you can’t really debug it out…

      • Buddahriffic@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 hours ago

        Yeah, if you turn off randomization based on the same prompts, you can still end up with variation based on differences in the prompt wording. And who knows what false correlations it overfitted to in the training data. Like one wording might bias it towards picking medhealth data while another wording might make it more likely to use 4chan data. Not sure if these models are trained on general internet data, but even if it’s just trained on medical encyclopedias, wording might bias it towards or away from cancers, or how severe it estimates it to be.

        • WorldsDumbestMan@lemmy.today
          link
          fedilink
          English
          arrow-up
          1
          ·
          19 minutes ago

          I see it like programming randomly, until you get something that is accidentally right, then you rate it, and it now shows up every time. I think that’s how it roughly works. True about the prompt wording, that can be somewhat limited too, thanks to the army of idiots beta testers that will make every kind of prompt.

          Having said that uh…it’s not much better than just straight up programming the thing yourself. It’s like, programming, but extra lazy, right?