• 4 Posts
  • 1.79K Comments
Joined 2 years ago
cake
Cake day: March 22nd, 2024

help-circle

  • I think it will massively correct, like the dotcom bubble for websites. LLMs are a useful utility, but not something that’s going to make economics irrelevant (like people thought about the internet).

    Why? LLMs are tools, text models, not AGI magic lamps, and a couple of con artists are trying to convince the world otherwise. That’s an oversimplification, but the jist of it.

    And I’m no LLM skeptic. I’ve been playing with ML as a hobby for a decade, with local LLMs before ChatGPT was even available, but the market attitude towards all this is absolutely bonkers. It’s worse than crypto.




  • It’s more complex than that. The weights of big models are distributed, and then tokens are processed in parallel for multiple users. The setup varies, but it could be 8 GPUs serving many dozens of users at once, or bigger sets with even more parallelism.


    I think the bigger problem is that Copilot is… shit.

    It’s probably some ancient, inefficient architecture, not something super sparse and hardware efficient like (say) Deepseek V4, or Kimi 2.6, or Gemini Pro.

    And literally every interesting dev team Microsoft has ever acquired (Phi, WizardLM, many more), and any interesting innovation they figured out, has just disappeared into a black hole.

    They don’t have custom hardware, either, like Huawei NPUs or Cerebras WSEs, or Google TPUs. They’ve written some very interesting papers on that, and proceeded to do squat with them.

    Also, it is AWFUL for its size. Tiny models that are basically free run circles around CoPilot.

    What I’m getting at is that CoPilot is probably the most inefficient LLM out there. Like, it’s impressive how bad it is.


  • I use sigma N sampling at 1.0, a slop phrase banlist, and maybe a little rep penalty.

    Beyond that it depends on the usage.

    For scripts or “questioning a document,” it’s as low as can be until it loops. I start with zero temperature. But I don’t really use Gemma for coding, TBH, and it’s not good for longer documents.

    If it’s for a specific language or a very specific script, I sometimes constrain grammar for the language.

    For more “general” writing, like brainstorming or RP or whatever, I start at around 0.7 with minimal DRY sampling and look at the logit percentages in the Mikupad UI. Especially “important” tokens like names or information recall. If the probability of getting correct answers is too low, I turn the temperature down.

    …But honestly, I tend to use big MoEs instead of Gemma for that, too.


    And if none of this makes any sense…

    Yeah. That’s the problem.

    Sampling was supposed to be a temporary stopgap until looping and such was figured out, but the big LLM devs just never addressed it in production. There are all sorts of interesting papers, including one from Google about sampling logits per-layer, but they don’t implement any of them in the API models.


  • Gemini actually has a really interesting architecture, hence it has fast responses, and it’s easily the best long context model out there.

    And outside of bechmaxxing or pure coding, Gemma is very good for its size. 12B is an incredible multimodal LLm, the only one natively trained for image/text ingestion without a mmproj hacked on at the end.

    …But it sure feels like executive meddling kills it.

    The pattern I see is:

    • Gemini preview is released.

    • It’s genuinely good! It’s smart, it’s straight.

    • Then they “refine” it, it’s gets more and more sycophantic, more deep fried. Long context performance degrades… benchmark scores go up, but anyone who actually uses it can immediately tell it’s gotten worse.

    • Only then, is it released for mass use.

    It’s obvious they took a good model, then enshittified it to make their bosses happy and tech bros in Twitter excited.

    Gemma has the same pattern. Researchers tease the local community, delay it, and then when a new Gemma finally comes out, it turns out to be using some old SWA architecture. And the biggest model is cut. And only a smaller one uses the multimodal training.

    It’s obvious it was neutered to not “threaten” Gemma API or be too “unsafe.”


    Another thing I’ve noticed is that both Gemini and Gemma are awful with their default 1.0 temperature/top-p 0.95. Sampling completely screws them up. But they like low temperature + minp, and Gemma loves constrained sampling.

    But 99% of users don’t know anything about sampling, so that’s going to leave a bad impression.


  • Not just them. GLM, Qwen, Kimi, Stepfun, Baidu’s models. Z-Image. Small finetuners, Huawei’s prototype. There’s even a Chinese fast food chain that trains a ridiculously good audio/text mixed model (Longcat).

    I actually thought the recent Deepseek preview was a little underwhelming and “deep fried” compared to competition, though maybe it’s just underbaked. And the architecture is interesting.

    Gemma is great, too, if Google would actually unrestrain it and give it Gemini’s architecture.

    Europe is struggling though. Mistral (and everyone else) basically can’t do anything because the EU left regulation ambiguous; however strictly they regulate AI (and it should be pretty strict), anything is better than “we have no idea if we’ll get litigated, the law is clear as mud and might change?” They have at least one communal training project too, but everything I’ve seen is weirdly dated, architecture wise, like they’re living two years in the past.


  • A constriction on GPUs is literally the best thing to ever happen to Chinese ML dev.

    It made them thrifty, it made them focus, it forced them to go open weights, it made them build proper ASICs, research new techniques, pay engineers to implement them, and now their models are supremely efficient, dirt cheap, running Nvidia free on Huawei NPUs, and close to better tools than the US models.

    Meanwhile, US models are all (except maybe Google) enshittifying and getting benchmaxxed. Engineers are wasting man hours hopelessly trying to scale training, which does not scale like people think, and are literally giving GPUs busywork to meet utilization quotas. They’re trying to scale data and parameter count, without improving architecture or data quality or even basic problems like random token sampling, and it’s not working anymore.

    At the same time, the big US AI houses have squashed nearly every bit of “garage innovation” I’ve seen. Cool teams, hero devs with proven work on a budget, they all just disappear into the maw of Microsoft or whomever like it’s a black hole, their work never integrated into anything.

    US AI is GOING to collapse because we gave all the money to tech bros so they can poison the well. The ML research community has been screaming this since like 2022. And apparently before, as Aaron Swartz allegedly identified Altman as a sociopath right before he died by suicide.


    Sorry to rant.

    Not that China doesn’t have significant dev issues, to be clear.

    Europe, too.

    But this is a sensitive point for me. Hobbyist machine learning has been a passion of mine for a decade, and it makes me sick to hear people quote Altman, like throwing GPUs at tech bros going to fix this. That. Is. A. LIE.


    I don’t have a solution either. In the AI space, I do not even see a path back to moonshot-style cooperative innovation like the US has repeatedly pulled off before.



  • Can we agree that Brave:

    • Is scummy.

    • Has a shady ceo, and a shady history.

    • Is possibly a security risk.

    • Is still orders of magnitude better than using Google Chrome.

    And that:

    • This headline is both true and clickbait-ish.

    • You can turn these things off in Brave’s settings, for free.

    • That doesn’t make this feature not scummy.

    Basically no one should be using Brave, but no one should be using Google Chrome either, yet here we are.

    And the revolving door of “best unabandoned Chromium fork to use” (Helium for the moment, or Ungoogled Chromium if you don’t mind some broken features, just to name two), is buried under so much SEO that it’s legitimately difficult to research.

    So… I’m not gonna go out of my way to flame Brave users. If they’re trying to do better than Chrome, good! Not-Google is good. They can pay for this I guess. I’m not installing Brave, though, I’m not recommending it, and this certainly isn’t making me want to.



  • They’re calling this out because Anthropic is afraid of dirt cheap, “good enough” open weights models undercutting them. Probably very afraid now that even Nvidia is on that boat, with huge Nemotron models.

    The real battle isn’t pro AI vs anti AI. It’s closed weights answers-as-a-premium-service vs open weights, hackable tools. It’s Huggingface vs OpenAI. It’s akin to Lemmy vs Reddit.

    Why would anyone use Anthropic once people figure out LLMs are configurable tools, not “AGI,” and efficient ones cost like 2 orders of magnitude less to run?

    So they want to squash open research. Because businesses are asking about costs now, they don’t realize they can just host assistants on-prem or through dirt cheap competing providers, but they’re starting to figure it out.