Basically a deer with a human face. Despite probably being some sort of magical nature spirit, his interests are primarily in technology and politics and science fiction.

Spent many years on Reddit before joining the Threadiverse as well.

  • 0 Posts
  • 1.05K Comments
Joined 2 years ago
cake
Cake day: March 3rd, 2024

help-circle





  • Wikipedia’s traditional self-sustaining model works like this: Volunteers (editors) write and improve articles for free, motivated by idealism and the desire to share knowledge. This high-quality content attracts a massive number of readers from search engines and direct visits. Among those millions of readers, a small percentage are inspired to become new volunteers/editors, replenishing the workforce. This cycle is “virtuous” because each part fuels the next: Great content leads to more readers which leads to more editors which leads to even better content. AI tools (like ChatGPT, Google AI Overviews, Perplexity, etc.) disrupt this cycle by intercepting the user before they reach Wikipedia.


  • A week or two back there was a post on Reddit where someone was advertising a project they’d put up on GitHub, and when I went to look at it I didn’t find any documentation explaining how it actually worked - just how to install it and run it.

    So I gave Gemini the URL of the repository and asked it to generate a “Deep Research” report on how it worked. Got a very extensive and detailed breakdown, including some positives and negatives that weren’t mentioned in the existing readme.








  • Only 12 percent reported both lower costs and higher revenue, while 56 percent saw neither benefit. Twenty-six percent saw reduced costs, but nearly as many experienced cost increases.

    So 38% saw benefits from AI, whereas “nearly” 26% saw cost increases from it. One could just as easily write the headline “More companies experience increased benefits from AI than experience increased costs” based on this data but that headline wouldn’t get so many clicks.






  • Raw materials to inform the LLMs constructing the synthetic data, most likely. If you want it to be up to date on the news, you need to give it that news.

    The point is not that the scraping doesn’t happen, it’s that the data is already being highly processed and filtered before it gets to the LLM training step. There’s a ton of “poison” in that data naturally already. Early LLMs like GPT-3 just swallowed the poison and muddled on, but researchers have learned how much better LLMs can be when trained on cleaner data and so they already take steps to clean it up.