Aggressive AI scrapers are making it kinda suck to run wikis

lemmydividebyzero@reddthat.com · 2 months ago

Aggressive AI scrapers are making it kinda suck to run wikis

tal@lemmy.today · edit-2 2 months ago

Detect whether the user is a human, but instead of blocking the request (which is going to be obvious to the scraper operator and will just cause the bot developers to go work on better human emulation until they get the data), poison the response. Just as blocking scrapers is hard for website operators, so is separating useful data from not-useful data for people building AI training corpuses.

https://www.cloudflare.com/learning/ai/data-poisoning/

Data poisoning involves injecting malicious information into training datasets to manipulate an AI model’s behavior, compromising its accuracy, reliability, and the overall integrity of machine learning results.

ag10n@lemmy.world · 2 months ago

This is the way. Computer use agents are common and can easily ‘browse’ to a page and grab the content.