☆ Yσɠƚԋσʂ ☆

  • 2.55K Posts
  • 3.36K Comments
Joined 6 years ago
cake
Cake day: January 18th, 2020

help-circle



  • That’s part of the idea with the whole mixture of experts (MoE) approach in newer models actually.

    Rather than using a single neural net that’s say 512 wide, you split it into eight channels/experts of 64. If the neural net can pick the correct channel for each inference, then you only have to run 1/8th of the neurons on every forward pass. Of course, once you have your 8 channels/experts in parallel, you now need to decide which expert/channel to use for each token you want to process. This is called a router which takes in an input and decides which expert/channel to send it to. The router itself is a tiny neural network. It is a matrix that converts the input vectors to a router choice. And the router itself has a small set of trainable weights that gets trained together with the MoE.


  • The trick they use is pretty clever. When you ask an AI to write code, it doesn’t always get it right. Sometimes the code has bugs, sometimes it misunderstands the problem entirely. A naive way to address that is to generate a few solutions and test each one. The odds that at least one works go way up. ATLAS generates multiple attempts, running each through a test suite. Each retry also gets told what went wrong with the previous attempt, so it can try to avoid the same mistake.

    But this can be pretty slow since you have to run the code in an isolated environment, check the outputs, wait for it to finish. Doing that for every candidate quickly adds up. So ATLAS has another shortcut for avoiding unnecessary testing. Instead of simply generating solutions and testing all of them, it tries to predict which one is most likely correct before running any tests.

    ATLAS also asks the model for an embedding of what it just wrote which acts as a fingerprint. Two similar pieces of code will produce similar fingerprints. A well-written, confident solution will produce a different fingerprint than a confused, buggy one.

    These fingerprints get fed into a separate, much smaller neural network called the Cost Field. This little network was trained ahead of time on examples where they already knew which solutions were correct and which were wrong. It learned to assign a score to each fingerprint. Correct solutions get a low score and incorrect ones get a high one.

    So the process is to generate multiple solutions, get their fingerprints, score each one, and pick the lowest. Only that one gets tested. The Cost Field picks correctly about 88% of the time according to the repo.









  • Completely agree, and now it’s just hip to say how much you hate AI. This kind of performative action doesn’t really accomplish anything, but it lets people feel good about themselves and gain social acceptance. Actually building an alternative takes work. The whole Linux analogy is very apt here because we’ve always had alternatives to corporate offerings, but most people don’t want to invest the time into learning how to use them.


  • I’d argue it’s inevitable for the simple reason that the whole AI as a service business model is a catch 22. Current frontier models aren’t profitable, and all the current service providers live off VC funding. And if models become cheap enough to be profitable, then they’re cheap enough to run locally too. And there’s little reason to expect that models aren’t going to continue being optimized going forward, so we are going to hit an inflection point where local becomes the dominant paradigm.

    We’ve seen the pendulum swing between mainframe and personal computer many times before. I expect this will be no different.




  • Seems to me there’s a huge amount of incentive for Chinese companies to pursue these things since China isn’t investing in a massive data centre build outs the way the US is. And their chips are still behind. Another major application is in robotics where on device resources are inherently limited. The only path forward there currently is by making the software side more efficient. It also looks like Chinese companies are embracing the whole open weights approach and treating models as shared infrastructure rather than something to be monetized directly.

    And local models have been improving at a really fast pace in my opinion. Stuff like Qwen 3.5 is not even comparable to the best models you could run locally a year ago.



  • Right, so far no American company managed to make any actual profits of selling LLMs as a service, and the cost of operating the data centres is literally an order of magnitude higher than the profits they pull in. And the kicker is that if models get efficient enough to bring the costs down, then they become efficient enough to run locally. So the whole business model fundamentally doesn’t make sense. Either it’s too expensive to operate, or nobody will want to use it as a service because running your own gives you privacy and flexibility.