

AI is a technology being developed and deployed by millions of people and thousands of corporations, across a huge number of countries. Users can probably be counted in the hundreds of millions now. Which ones’ “end goal” is this?
Basically a deer with a human face. Despite probably being some sort of magical nature spirit, his interests are primarily in technology and politics and science fiction.
Spent many years on Reddit before joining the Threadiverse as well.


AI is a technology being developed and deployed by millions of people and thousands of corporations, across a huge number of countries. Users can probably be counted in the hundreds of millions now. Which ones’ “end goal” is this?


It’s been darkly amusing watching the various social media hive-minds that used to be all for the concept of “information wanting to be free” suddenly discovering that they hate AI more than they love freedom of information.


This is exactly why I’ve been recording personal logs and saving archives of all of my digital interactions for over a decade already. Going to make one of these for myself at some point using local models.
They laughed, called me mad. Well who’s mad now?


An attacker could trick a user into clicking a malicious link inside a Markdown file opened in Notepad
So you can give someone a Markdown file with a link to an application, and if they click the link the application runs.
Markdown supports links, yeah.


Sure, I’m not saying this isn’t “malicious.”
I’m questioning why this particular instance of lawbreaking makes his site an “unreliable source”, whereas all the copyright violation he’s been up to all along didn’t? And now you’re bringing in speculative instances of future lawbreaking that also seem unrelated, what does crypto mining have to do with the reliability of the sources archived there?
My point here is that people are jumping from “he did something bad that I don’t like!” to “therefore everything he does is bad and wrong!” Without a clear logical connection between those things. Sure, the DDOS thing is a good reason to try to avoid sending traffic to his site. But that has nothing to do with the reliability of the information stored there.


Is it really an “unreliable source”, though? The owner of the site is acting maliciously with regards to this DDOS, of course, but that doesn’t necessarily mean he’s going to act maliciously about the contents of archive.today itself.
One could make the case that the owner of archive.today was already flagrantly flouting copyright law, and therefore a criminal, and therefore “unreliable” right from the get-go. Let’s not leap to conclusions here.


You’re misinterpreting what Wikimedia’s “free knowledge” mandate is about. They have a hard-line requirement that the knowlege they distribute is legally free, for example - it has to be under an open license. archive.today is quite the opposite of that. They don’t just archive any old knowledge willy-nilly, they’ve got standards. And so forth.
Simply running an archive.today clone would not fit. The “source documents only” archive would already be stretching the edges rather far. There’s already Wikisource, for example, and it’s got the “open licenses only” restriction.


I think that’d go pretty far beyond Wikimedia’s mandate, but having something whose purpose was specifically archiving just the sources for their articles would be pretty awesome.
And remember: if you’re not running her on your own hardware she’s not an AI girlfriend. She’s an AI prostitute.


It’s weird how AI has turned so much of the internet from its generally anti-copyright stance. I’ve seen threads in piracy and datahoarding communities that were riddled with “won’t someone please think of the copyright!” Posts raging about how awful AI was.
I maintain the same view I always have. Copyright is indeed broken, because of how overly restrictive and expansive it has become. Most people long ago lost sight of what it’s actually for.


If they do it’s not by the actual training of AI.


Simple.wikipedia isn’t a summary of regular Wikpedia, it’s a whole separate thing. It’s intended to convey the same data, just in a simpler way.


The problem being discussed here is not the availability of Wikipedia’s data. It’s about the ongoing maintenance and development of that data going forward, in the future. Having a static copy of Wikipedia gathering dust on various peoples’ hard drives isn’t going to help that.


Wikipedia’s traditional self-sustaining model works like this: Volunteers (editors) write and improve articles for free, motivated by idealism and the desire to share knowledge. This high-quality content attracts a massive number of readers from search engines and direct visits. Among those millions of readers, a small percentage are inspired to become new volunteers/editors, replenishing the workforce. This cycle is “virtuous” because each part fuels the next: Great content leads to more readers which leads to more editors which leads to even better content. AI tools (like ChatGPT, Google AI Overviews, Perplexity, etc.) disrupt this cycle by intercepting the user before they reach Wikipedia.
A week or two back there was a post on Reddit where someone was advertising a project they’d put up on GitHub, and when I went to look at it I didn’t find any documentation explaining how it actually worked - just how to install it and run it.
So I gave Gemini the URL of the repository and asked it to generate a “Deep Research” report on how it worked. Got a very extensive and detailed breakdown, including some positives and negatives that weren’t mentioned in the existing readme.


Oh boy, I bet the comments on this one will be useful.


LLMs were trained on our social media feeds, after all…


Ah, low numbers of seeds. Must’ve just not wanted to wait.
One of my happy imaginings is that perhaps someday my AI simulant will be in a museum somewhere getting to chat with whatever entities descend from us to compare and contrast how things are now with how they are whenever that is.