• tal@lemmy.today
    link
    fedilink
    English
    arrow-up
    15
    ·
    edit-2
    3 hours ago

    Detect whether the user is a human, but instead of blocking the request (which is going to be obvious to the scraper operator and will just cause the bot developers to go work on better human emulation until they get the data), poison the response. Just as blocking scrapers is hard for website operators, so is separating useful data from not-useful data for people building AI training corpuses.

    https://www.cloudflare.com/learning/ai/data-poisoning/

    Data poisoning involves injecting malicious information into training datasets to manipulate an AI model’s behavior, compromising its accuracy, reliability, and the overall integrity of machine learning results.