Wikipedia probably wants to sell access to LLMs to train. It’s only valuable if Wikipedia remains a high-quality, slop-free source.
I think even AI zealots think there should be silos of content to train from that are fully human generated. Training slop on slop makes the slop even worse.
The content is CC licensed, but they are trying to block AI scraping because it overloads their servers. They have a paid API that uses a lot less compute for both Wikipedia and the AI, as well as being a revenue source for Wikipedia.
This was only done because the editors pushed to minimize AI involvement. There’s a comment here already mentioning that:
https://lemmy.world/comment/22826863
Wikipedia probably wants to sell access to LLMs to train. It’s only valuable if Wikipedia remains a high-quality, slop-free source.
I think even AI zealots think there should be silos of content to train from that are fully human generated. Training slop on slop makes the slop even worse.
Sell licenses of what? It’s already all in the creative commons iirc.
The content is CC licensed, but they are trying to block AI scraping because it overloads their servers. They have a paid API that uses a lot less compute for both Wikipedia and the AI, as well as being a revenue source for Wikipedia.
AI already trains on Wikipedia.
https://commoncrawl.org/
This was only done because the editors pushed to minimize AI involvement. There’s a comment here already mentioning that: https://lemmy.world/comment/22826863