Bots are currently scraping the internet for LLM training data at unprecedented rates[1][2][3], driving up costs and destabilizing public-facing websites. I want to talk about how this has been particularly difficult for wikis, and has gotten much worse in the last few months.
I could see maybe caching that and providing it to a not-clearly-human user if it is in cache. That lets someone do something like link to a particular version of a file in a discussion here on the Threadiverse. The first user loading it will cause it to be cached.