Aggressive AI scrapers are making it kinda suck to run wikis

lemmydividebyzero@reddthat.com · 2 months ago

Aggressive AI scrapers are making it kinda suck to run wikis

Jason2357@lemmy.ca · 2 months ago

The issue with wiki’s and source forges is that there is a maze of links to all past versions of everything, each generated on demand from a cpu-expensive database query. You basically have to limit the pages anonymous users can spider into. Forgejo has a setting to block expensive pages from non-logged in users for example.

tal@lemmy.today · 2 months ago

I could see maybe caching that and providing it to a not-clearly-human user if it is in cache. That lets someone do something like link to a particular version of a file in a discussion here on the Threadiverse. The first user loading it will cause it to be cached.

Jason2357@lemmy.ca · 2 months ago

Sure. Its just the thousands of obscure page edit history pages that ai crawlers hit every hour that cause the problem.

bountygiver [any]@lemmy.ml · 2 months ago

Probably can configure anubis to require challenge that is proportional to the CPU time needed to render each page?

Jason2357@lemmy.ca · 2 months ago

That seems to be a lot of peoples approach, but if they cared about time or bandwidth they wouldn’t be spidering Dow into your commit history multiple times a day. They have more patience and resources than your human readers.