The Editors Protecting Wikipedia from AI Hoaxes

silence7@slrpnk.net · 11 months ago

The Editors Protecting Wikipedia from AI Hoaxes

ɯᴉuoʇuɐ@lemmy.dbzer0.com · 11 months ago

As far as Wikipedia is concerned, there is pretty much no way to use LLMs correctly, because probably each major model includes Wikipedia in its training dataset, and using WP to improve WP is… not a good idea. It probably doesn’t require an essay to explain why it’s bad to create and mechanise a loop of bias in an encyclopedia.

Cocodapuf@lemmy.world · 11 months ago

and using WP to improve WP is… not a good idea.

That is not inherently true. For example, there was an instance when I read a Wikipedia article, and a chart was simply incomplete, there were entries in the chart left blank, when I knew that data existed. All I had to do was look up those exact items in Wikipedia and the correct numbers were there, readily available.

I think that was when I first created a Wikipedia account for editing. There was an article clearly missing information and I knew it would be both non controversial and quite easy to fill in that information.

My point is, that first article could definitely be meaningfully improved, using only information already available on Wikipedia.

FaceDeer@fedia.io · 11 months ago

You’re probably assuming that someone would just go to an LLM and say “write a Wikipedia article about subject X”? That wouldn’t work well, but that’s very far from the only way to use LLMs for Wikipedia work.

For starters, it doesn’t have to actually write content at all. You could paste an existing article into an LLM and ask it “What facts in this article lack references to back them up? Are there any weasel-worded statements, or statements that don’t appear to follow a neutral point of view?” And get lists of things that require attention.

Or you could paste a poorly-worded article in and tell it to rewrite it with all the same information but better phrasing or structure. You could put a bunch of research materials you’ve gathered into the LLM’s context and tell it to write a summary in the style of a Wikipedia article, with references to the sources for each fact mentioned. Obviously you’d check the LLM’s work afterward and probably do some manual editing, but this would be a great time and effort saver to get a first draft written. You could take an existing article and tell the LLM that some particular fact had changed or been discovered to be incorrect and ask it to rewrite the relevant parts to account for that.

Wikipedia is in many, many languages. You could have a multilingual LLM automatically compare the contents of different language versions of a Wikipedia article and ask it to spot differences in content or tone. You could have an LLM translate an article from one language to another as a starting point for creating an article in that new language.

You could have the LLM check the references of an existing article - look up each referenced work on the web and see whether it genuinely says what the article that’s using it as a reference says. It could flag all manner of subtle problems that way. Perhaps the reference sounds biased, or whoever used it as a reference misinterpreted it, or the link was simply incorrect and points to unrelated material. Being able to have an AI do a first-pass check of all that in a completely automated way would save huge amounts of time.

This is all just brainstorming off the top of my head, so I’m sure there’s plenty of other good uses that aren’t coming to mind.

ɯᴉuoʇuɐ@lemmy.dbzer0.com · 11 months ago

I don’t get the impression you’ve ever made any substantial contributions to Wikipedia, and thus have misguided ideas about what would be actually helpful to the editors and conductive to producing better articles. Your proposal about translations is especially telling, because the machine-assisted translations (i.e. with built-in tools) have already existed on WP long before the recent explosion of LLMs.

In short, your proposals either: 1. already exist, 2. would still risk distorsion, oversimplification, made-up bullshit and feedback loops, 3. are likely very complex and expensive to build, or 4. are straight up impossible.

Good WP articles are written by people who have actually read some scholarly articles on the subject, including those that aren’t easily available online (so LLMs are massively stunted by default). Having an LLM re-write a “poorly worded” article would at best be like polishing a turd (poorly worded articles are usually written by people who don’t know much about the subject in the first place, so there’s not much material for the LLM to actually improve), and more likely it would introduce a ton of biases on its own (as well as the usual asinine writing style).

Thankfully, as far as I’ve seen the WP community is generally skeptical of AI tools, so I don’t expect such nonsense to have much of an influence on the site.

FaceDeer@fedia.io · 11 months ago

Heh. I fell off of contributing in recent years, but there was a time back in the day when my edit count was in the top hundred or so. Your impression is completely wrong.

Anyway, this discussion here isn’t going to affect what the people on Wikipedia are doing, so it doesn’t really matter. I linked to the project page above and it’s quite clear that even this “AI Cleanup” project is not in any way fundamentally opposed to using AI, they’re just focused on ensuring that editors using it are adhering to Wikipedia’s guidelines. If you think AI can’t do that then clearly your concept of how AI is useful is too limited.