• 0 Posts
  • 82 Comments
Joined 1 year ago
cake
Cake day: June 23rd, 2023

help-circle

  • You say “Not even close.” in response to the suggestion that Apple’s research can be used to improve benchmarks for AI performance, but then later say the article talks about how we might need different approaches to achieve reasoning.

    Now, mind you - achieving reasoning can only happen if the model is accurate and works well. And to have a good model, you must have good benchmarks.

    Not to belabor the point, but here’s what the article and study says:

    The article talks at length about the reliance on a standardized set of questions - GSM8K, and how the questions themselves may have made their way into the training data. It notes that modifying the questions dynamically leads to decreases in performance of the tested models, even if the complexity of the problem to be solved has not gone up.

    The third sentence of the paper (Abstract section) says this “While the performance of LLMs on GSM8K has significantly improved in recent years, it remains unclear whether their mathematical reasoning capabilities have genuinely advanced, raising questions about the reliability of the reported metrics.” The rest of the abstract goes on to discuss (paraphrased in layman’s terms) that LLM’s are ‘studying for the test’ and not generally achieving real reasoning capabilities.

    By presenting their methodology - dynamically changing the evaluation criteria to reduce data pollution and require models be capable of eliminating red herrings - the Apple researchers are offering a possible way benchmarking can be improved.
    Which is what the person you replied to stated.

    The commenter is fairly close, it seems.


  • Monument@lemmy.sdf.orgtoMemes@lemmy.mlToxicity
    link
    fedilink
    English
    arrow-up
    3
    ·
    13 days ago

    That’s very fair, indeed.

    Perhaps awareness of one will spark awareness of the other. I suppose my concern is that plasticisers are sort of a ‘hidden’ risk, for the most part. They’re used in nearly every food packaging (and prep, such as hoses) that isn’t contained in glass, or served up in its own peel.


  • Monument@lemmy.sdf.orgtoMemes@lemmy.mlToxicity
    link
    fedilink
    English
    arrow-up
    42
    ·
    13 days ago

    Microplastics are terrifying and all that, but I’m sort of more worried about plasticisers like BPA, BPF, BPS and the rest of the alphabet of BP-whatever’s that was created and brought into use after the dangers of BPA were realized.

    Just a heads up - if something plastic says it’s BPA-free, it probably uses a different bisphenol compound that is less studied than BPA. And is likely as toxic (or even more toxic)!

    But nobody ever talks about those, because science words.






  • I haven’t yet looked at the map (I will!), but I’m struck by the idea that perhaps a map should exist that shows how USDA hardiness zones will shift. (I mean - according to best guesses.)

    If I had the ability, it would be interesting to make a map that asks users what their favorite local tree or animal is, and tells them how long it will be able to survive near them. Nearly impossible to account for all use cases, but I digress. Even simpler - Go for a map of state trees, flowers, and animals with extinction times for each to let folks know how long each state can hold onto its signature species. Well, for the ones that aren’t already gone, anyway.









  • Honestly kind of excited for the company blogs to start spitting out their disaster recovery crisis management stories.

    I mean - this is just a giant test of disaster recovery crisis management plans. And while there are absolutely real-world consequences to this, the fix almost seems scriptable.

    If a company uses IPMI (Called Branded AMT and sometimes vPro by Intel), and their network is intact/the devices are on their network, they ought to be able to remotely address this.
    But that’s obviously predicated on them having already deployed/configured the tools.




  • I’m cynically viewing this as not a positive. I assume this is so they can make pages 2, 3 and so on as spammy as page 1.

    Not at first, obviously. You don’t boil that frog on high heat.
    You throw out a second page with a cute little text ad off to the side, then 1 or 2 at the top, then a mid-page ad. Maybe some suggested content.

    Instead of having to scroll through a page’s worth of ads to get to semi-relevant results with a gem hidden in them, it’ll be a pages worth of ads for your semi-relevant results per page, and maybe what you were looking for 4 or 5 pages in.

    Google used to be good. They ‘know’ what people are looking for. So they’ll probably hire someone familiar with gambling to figure out a minimum dispersion of relevant results on the pages, to keep people using the service and scrolling past ads. … I used to remember this. Variable-ratio reward schedule?