New York is set to become the first state to impose guardrails on “stealth crawlers,” or unauthorized software that trawls news sources to scrape content.
Realistically what good would it do once you already had scraped the pattern of news sites it’s already over. All this is doing in actuality is preventing new start ups from competing in the ai space. so really this is the fastest enshitification world record of a medium. Whether you like or hate ai this is actually an enshitification of it. ( I hate ai.)
For AI purposes the really useful part of a news site is the actual news - you know, the stuff that changes practically every minute - not the “structure” of the site.
These news sites aren’t being scraped for training data anymore but to provide near-realtime up to date information to the models.
Meaning e.g. Gemini can scan your news article, extract the useful information for the user, and deliver it to the user, without them ever going to your news site and providing the interaction that at the end of the day is converted to money - money your site needs to run.
I’m guessing it’s to eliminate the issue of a site not getting clicks because the article you were about to read is already summarized for you. It also opens the door for revenue negotiations for allowing their content to be scraped for that purpose, as the scraper bots would now be identified.
Realistically what good would it do once you already had scraped the pattern of news sites it’s already over. All this is doing in actuality is preventing new start ups from competing in the ai space. so really this is the fastest enshitification world record of a medium. Whether you like or hate ai this is actually an enshitification of it. ( I hate ai.)
What are you on about?
For AI purposes the really useful part of a news site is the actual news - you know, the stuff that changes practically every minute - not the “structure” of the site.
These news sites aren’t being scraped for training data anymore but to provide near-realtime up to date information to the models.
Meaning e.g. Gemini can scan your news article, extract the useful information for the user, and deliver it to the user, without them ever going to your news site and providing the interaction that at the end of the day is converted to money - money your site needs to run.
I’m guessing it’s to eliminate the issue of a site not getting clicks because the article you were about to read is already summarized for you. It also opens the door for revenue negotiations for allowing their content to be scraped for that purpose, as the scraper bots would now be identified.