Selfhosted & AI - Part 2: The Results

curbstickle@anarchist.nexus · edit-2 22 days ago

Selfhosted & AI - Part 2: The Results

curbstickle@anarchist.nexus · 29 days ago

As a thought, if a repo is already using ai-declaration.md or a similar ai disclosure, I think posting a link to that declaration as the reply to the AIP comment should count as the declaration reply, since they are already providing that information.

irmadlad@lemmy.world · 29 days ago

You’re doing a good job herding cats @[email protected]. I don’t envy you.

speculate7383@lemmy.today · 28 days ago

@[email protected] This is a very reasonable approach. Many people don’t appreciate the effort involved to be a moderator, so let me just say thank you for your work here!

irmadlad@lemmy.world · 28 days ago

Many people don’t appreciate the effort involved to be a moderator,

Oh shit, I do…it’s the reason I’m not one anymore. I have lost 90% of my patience in my old age.

mic_check_one_two@lemmy.dbzer0.com · 28 days ago

I think this is a very well thought out approach to handling it. I can’t personally think of any better solutions, at least. I probably would have chosen some different phrasing for the tags, (CBH feels… Disconnected? I’d probably go with something like “No AI” or “AI-Free” instead), but that’s just a matter of personal diction. Outright banning posts about projects that use AI likely isn’t going to be feasible in the long run, and I think that simple declaration requirements will go a long way towards encouraging people to actually disclose their usage.

If you outright ban it, people will simply hide their usage. It feels like it’s akin to the US War On Drugs^TM in that way. If you allow it and simply require responsible disclosure, more people will be inclined to be upfront about it. And that allows projects to be more accurately audited and vetted. The same way the war on drugs consolidated power to organized gangs (by making them the only ones capable of producing and transporting illegal drugs at scale), an outright ban on AI would only encourage people to hide their usage.

One potential way I see people trying to skirt the rules regarding self-promos is via proxy/strawman accounts. It would be trivial for me to spin up a dummy account and post my own project as an “I found this cool project but don’t have to disclose my AI use because I didn’t make it” post. I don’t personally have any projects in the works to post about, but I can easily see someone using it to try and skirt the disclosure requirements. Especially when we have seen situations like the (now infamous) Huntarr debacle, where the vibe-coder dev was actively avoiding AI disclosures. Because they knew it would tank the project’s popularity if people knew it was vibe coded.

I’m not sure if there is a good solution for this potential issue, except maybe to limit posts by new users. But even that is trivial to bypass. If you limit them based on account age, simply making a few strawman accounts and waiting for them to age is easy. Hell, I already have a few old throwaway accounts that I could swap over to whenever I want, and I’m not even planning anything nefarious.

There are similar problems with restricting users based on post/comment count, as that will likely stifle discussion from new users who are trying to be active in the community. One of the more frustrating parts about Reddit was that many of the most popular subs banned posts from users who were below a certain post karma threshold or who didn’t have enough previous posts. It created a catch-22 where you needed to have a few popular posts before you were allowed to make any posts. So there were people posting on random niche subs, simply for karma farming before they could then post on the larger subs. And if I was a vibe coder without scruples who is already looking to skirt the rules, it would be trivial for me to spin up an LLM and let it make a few comments before I start using it as a dummy account.

This may end up being a non-issue in the grand scheme of things. But I figured I’d mention it, because I genuinely don’t see a good solution for patching the big glaring hole in the self-promo rule. You’re absolutely correct that requiring disclosure for every post is unrealistic, because lots of users who post projects here aren’t the devs. They just stumbled across a cool project and wanted to share it, and they have no realistic way of knowing if the project uses AI. And if you restrict promo posts to only devs, you’ll only get posts from the people who fall into the (likely very small) overlapping section on the “is a Lemmy user” and “makes projects” Venn diagram. Lemmy is already a small community in the grand scheme of things. And restricting promo posts to only the people actively developing the projects would make it feel even smaller.

If I do use mine, I’ll put it up on codeberg so everyone can see exactly what its doing… and then get mad and tell me there is a better way.

Poe’s Law is always in effect. The best way to get an answer on the internet is not to ask a question; it is to post an incorrect answer, because people will go out of their way to correct you.

irmadlad@lemmy.world · 28 days ago

it is to post an incorrect answer

60% of the time, it works every time.

mic_check_one_two@lemmy.dbzer0.com · 28 days ago

I’m honestly shocked that nobody has corrected my incorrect usage of Poe’s Law.

irmadlad@lemmy.world · 28 days ago

Slow day

brucethemoose@lemmy.world · edit-2 29 days ago

Failure to provide a disclosure after using the tag would mean removing the post. It could be locked, but I would have to assume the majority of the spam-type postings that happened to make it past the rule 7 criteria are the ones who will not provide the requested disclosure. I think it makes for a good filter this way, but please comment if you think otherwise.

Sounds reasonable to me!

I think the major choice is for y’all (the mod team), as enforcing a tagging system is going to increase the moderation workload. Though I guess it would cut back on AI reports, like you said.

I have no recommendations for an existing bot.

…You could use an embeddings model for a little extra automation though.

This is a pre-LLM thing, but basically you could feed a script new untagged posts, use a embeddings model to compare the text of their bodies to a keyword (“AI”?), and spit out a number as a rough “similarity” metric. If it’s above a certain threshold (eg if the post seems AI related), send a message to the moderation team to check it, or maybe even post a rules reminder in the comments.

And FYI, embeddings models are tiny, so it doesn’t need special resources to run or anything.

curbstickle@anarchist.nexus · 29 days ago

Don’t think I need the model tbh, I’m generally on enough to address the untagged. The annoying part would be making the same comment over and over again (thus the short bit of python)

TomAwezome@lemmy.world · 29 days ago

Those three tags for promo posts seems like a pretty good compromise, don’t really have any better suggestions for the exact acronyms or tag specific descriptions. I use LLMs for personal and work but I don’t post promotional material about any of it, I think most people using AI for personal side-projects aren’t making promo posts about it either, so already this won’t affect most people. The most vocal users in Lemmy selfhosted are going to downvote the hell out of anything that has an AGENTS.md or a single commit that smells AI-generated regardless of the tags, this will mostly speed up the dogpiling.

curbstickle@anarchist.nexus · 29 days ago

regardless of the tags, this will mostly speed up the dogpiling.

Yep.

But single word comments getting posted repeatedly can be removed, so while I don’t think the up/downvotes will change, I think the comment section will.

And maybe those folks who rush to downvote will realize they can just filter out the posts instead. We’ll see how that works out though.

lambalicious@lemmy.sdf.org · 27 days ago

And maybe those folks who rush to downvote will realize they can just filter out the posts instead.

Sure, but IMO that’s not the point. About one half of the point of commenting on slopware posts is to make known that the use of AI is disapproved

curbstickle@anarchist.nexus · 27 days ago

And I’ll remove it.

captcha_incorrect@lemmy.world · 28 days ago

We need a tag for when the AI use is not known.

Here is a great project: https://github.com/sapristi/mmuxer

Mail Muxer is a Python tool that will monitor your Inbox, and filter incomming emails according to the given configuration.

Is AI used in any shape or form to develop this? No clue, I am not affiliated, but I would really like to post a promo about it and would need to tag it correctly.

curbstickle@anarchist.nexus · 28 days ago

Promo, to be clear, is a self promotion post.

“I found a neat project” doesn’t apply, because you aren’t affiliated and wouldn’t know how it was developed, if AI was used and how, etc. You also wouldn’t be trying to get stars, clicks, donations, or payment for that software, so the promotion rules do not apply.

Its just a regular post.

SuspiciousCarrot78@aussie.zone · edit-2 26 days ago

Hmm. Not sure I agree with “just mark the parts that are AI generated” because that obfuscates the parts that were human made, skewing perception towards “it’s all AI gen”.

Require the full accounting - human, clanker, level.

Design - Human
Implementation - Pair
Testing - Assist
Documentation - Human
Review - Human
Deployment - Human

Reads differently to

Implementation - Pair
Testing - Assist

4/6 human vs ?? / Human is a different trust signal (which is what this is actually about, right?)

PS: I’m a fan of acronyms, so how about “show us the STACK or show us the DIRTY”

Spec (Design)
Testing
Assembly (Implementation)
Checks (Review)
Knowledge (Documentation)

Or

Design
Implementation
Review
Testing
Yeet (deployment)

lambalicious@lemmy.sdf.org · 24 days ago

Yeet (deployment)

Stealing this for my next CS or engineering employment.

Brkdncr@lemmy.world · 29 days ago

Sounds like too many rules to me. I’d recommend a “no low effort ai” rule.

Also, AIT is regularly used to abbreviate AI Tool

curbstickle@anarchist.nexus · 29 days ago

The only ones with extra effort will be promo posts, and this disclosure is regularly requested of them anyway.

You’d also need to define “low effort ai”.

I don’t see that working, sorry.

Brkdncr@lemmy.world · 29 days ago

Asking people to tag AI, and also have a few different AI tags, and also read more than 3 sentences…mods are going to be busy enforcing the rules.

curbstickle@anarchist.nexus · 29 days ago

That would be me, yes. And considering what I already get reports on, this makes for clear practice and would overall reduce the issues that are currently out there.

brucethemoose@lemmy.world · 29 days ago

Normally I’d agree, but the tagging rule won’t affect the majority of posts. I think it’s an acceptable complication, in this case.

Especially with how much vibecoded spam is in the horizon.

brucethemoose@lemmy.world · 29 days ago

Vibecoded spam is deliberately engineered to look “high effort,” so even with the vagueness of such a rule, it wouldn’t cover the spam so well.

Brkdncr@lemmy.world · 29 days ago

How would the proposed rules help? Isnt spam already covered regardless of AI?

brucethemoose@lemmy.world · edit-2 29 days ago

Because, with a cursory glance, it doesn’t always look like spam.

A classic example I see starts with “I built a…” in the title, has a wall of text in the description, and actually promises to do something interesting. Only upon deeply inspecting the code (or trying it yourself)… it becomes clear it’s hallucinated nonsense.

And it’s not always malicious, either. A lot of devs get deep in AI psychosis and truly believe they’ve building something revolutionary with their vibe coding agent.

And sometimes these projects are interesting!

Hence it would be EXTREMELY helpful to have this tagged, up front. To me, an [AIP] is gigantic red flag to warrant extra caution, but not necessarily a smoking gun, and would help “regular” homebuilt projects stand out from the vibecoded ones.

And [AIT] is just nice to have. Some users don’t want to see any AI in /c/selfhosted, period. Hence AI discussion posts get reported as spam because people interpret it as spam, and this would clarify that nebulous distinction, while giving those users a way to easily filter AI posts out.

Brkdncr@lemmy.world · 29 days ago

I wish the mods best of luck with implementing and enforcing this.

AI generally doesn’t need a lot of special handling when it comes to policies. It’s like any other tool, it’s just made it a lot easier for people that don’t know how to code get something made.

If anything, it might be easier for people to tag their level of experience.

richmondez@lemdro.id · 29 days ago

Maybe, but even experienced devs seem to want to fall into the trap of thinking their expertise will mean they can skim review AI code and spot it’s mistakes rather than taking the time to properly review and understand the code. Low effort is low 3ffort regardless of your expertise.

brucethemoose@lemmy.world · 29 days ago

Vibecoded self promo is a growing, specific spam problem though.

And a appreciable fraction of Lemmy/Piefed is “anti AI absolutist.”

I think that’s pretty unique.

richmondez@lemdro.id · 29 days ago

I think anything over the “assisted” threshold in the OP is low effort and should be dumped.

Shimitar@downonthestreet.eu · 29 days ago

Woah …

This is overly complex.

As a dev that sometimes published something, and I don’t vibe code butnl who doesn’t use AI nowadays? That is way too much complex. And zero projects today don’t use AI in any forms blnot even to search or bugfix …

hertg@infosec.pub · 29 days ago

Then this must come as a surprise to you, but I do not use “AI” whatsoever. Not for coding, not for fixing bugs, and not for coming up with concepts. Crazy right?

Shimitar@downonthestreet.eu · 29 days ago

Not at all …

You are free to do as you please, and I fully respect that.

I was also a no AI coder, but somehow changed my mind slowly as I learned how to use PROPERLY the tool, which can be quite useful.

Learning how to use it has been fun too, so I suggest you give it a try if you haven’t done so yet.

The first risk is abusing it. The second risk is trusting it. And there are many more risks, but AI is a knife and not a pistol: there are good uses for it, but you must be careful and use it properly all the time.

curbstickle@anarchist.nexus · 29 days ago

Disclosure itself is a need, and I can confidently say there are enough people who are “no ai ever”, “all ai all the time”, and “only the AI use I agree with” to make something needed.

About the only way to simplify would be to not define the disclosure types, just to disclose it, but then half the post will be discussing where and how if its not defined (along with a bunch of reports about not fully disclosing AI use).

If promo posts included that up front, I don’t think it would be an issue, but its rare that any post includes even “I used/didn’t use AI”, if that.

Shimitar@downonthestreet.eu · 29 days ago

Disclosure is needed, I agree.

Let’s say it feels complex, and the tags will not avoid the discussion in the comments anyway … but it’s a start so good for it

curbstickle@anarchist.nexus · edit-2 29 days ago

I’d love an idea to trim it down… but with the wide varieties of ways AI can be used, its hard.

I’m a good example of the “problem” person in a way. I’ll test all kinds of things (including a completely, 100% vibe coded app posted here recently… in a sectioned off vlan of course), but what AI was used for influences where I look. Documentation? Ok, not the worst, but I’m going to check for human review/blatant llm goofs. You used it to figure out how to talk to a serial controlled endpoint? Ok, thats what needs to be checked first.

You made the whole ass thing with Claude? I’ll test it like I said, but I doubt it would ever end up anywhere near my own production use, its more as a curiosity. 99/100 that level of generated is basically the same as calling it unmaintained imo.

So there is definite value to knowing where/how/how much, and if the comments consist of things already stated and just add “slop”, thats going to get deleted, its already disclosed, the people who comment that should filter instead. Its a two way benefit this way as I see it.

That said - I’m always open to options here, but considering recent comments and reports since I’ve taken over moderating, something is definitely needed.

Edit: And just to mention - nothing is ever set in stone, if you’ve not seen my other comments about it. Should anything change, or it becomes unwieldy, or someone finds a workaround to abuse, whatever - its always open for discussion.

reelworks@lemmy.world · 29 days ago

@[email protected] Thanks for all that you are doing here. I want to come out and say this -even if thats not helping dicussion here. I apologize. I was part of the problem mentioned here.

I have taken down the repo, and the docker images. If I had said anything in the post that started ‘I built’ I take that back. Someone whse core value is not to dilute the truth - I missed something. Even my social post refined with AI (not a native english speaker) did credit AI. And AI usage policy, PR template with disclosure, was present in repo, took a lot of care, but I mentioned ‘not another vibe coded arr project’ the the README and then didn’t say by how much. I presumed. To me vibe coding was completely different from what I did.

I don’t want to give out my resume, or state what I did, why I did etc. I will need to say this: I like the policy laid out here and 100% make sense. I want to say I believe what you believe or even more so for a lot of reasons but then I also see various shades of grey there. It was not AI psychosis that led me to create an account and a repo like that. Also I didn’t ask for donations or ever expected one. I am sure I don’t have what it takes to maintain a linux kernel or something like curl, and not an experienced OSS maintainer. I am also not a sad AI driver without skills either. I came in without any repute, no hello nothing and dropped that recent one you likely talked about here. The product pitch wasn’t clean, the wall of text there also did not point to an easy to understand single purpose tool - it was doing many things. Looking back, it would have been infuriating to folks in that channel. Apologize again, for whatever disruption I made in the arr community from where I have taken much. Servarr had every right to kick me out… Thanks for the clarity. Grateful for the learning.

Not refined with AI.

curbstickle@anarchist.nexus · edit-2 29 days ago

None of this is specifically about you, FYI.

There have been a good number of posts, and there are some people very solidly anti AI anything, some who use it as a tool, and some who use it for everything. That combo meant we need rules about it - in addition to the rules about account age and f/loss exceptions and the like.

Edit: For the record, the 100% vibecoded app I tested was posted by someone else.

captcha_incorrect@lemmy.world · edit-2 28 days ago

I asked the same question it should be tagged as [AIP]+[hint] under these rules.

hint Human acts on the task and the AI surfaces suggestions passively.

curbstickle@anarchist.nexus · 28 days ago

To be clear only AIP would be in the title, you can mention the hint and the where in the disclosure comment

lambalicious@lemmy.sdf.org · 27 days ago

And zero projects today don’t use AI in any forms

Everything I do is a counterexample. Get fucked.

timochka@lemmy.zip · 29 days ago

Christ.

When will the forward-planning sub-committee of the AI tagging steering group be meeting? I presume they’re going to need to submit a motion to the ways-and-means council sub-sub-committee first and then maybe we can expect a notice on the procedure to follow for interim planning permission to write a post? Will interim planning permission allow the post to be made (subject to the countersignature of the automated post approval bot) or should it be saved in Drafts and then a separate submission (noting the interim permission and any objections received in the consultation period) be made to the full plenary session of the zoning committee?

Or do we just say “fuck this shit” and find another group?

curbstickle@anarchist.nexus · 29 days ago

You are more than welcome to offer another option.

I’ll mention:

No tag is generating reports
No tag is causing a bunch of unhelpful comments
No disclosure is generating reports
Too basic of disclosure is generating reports

Please, feel free to provide an option.

I’ll point out that what you’re commenting on specifically applies to promo project posts, and nothing else.

timochka@lemmy.zip · edit-2 28 days ago

Honestly? Let the wave of neo-ludditism pass?

Is it necessary to respond to every Internet fad? Ignore it and eventually it will go away. Pretending otherwise is like thinking King Cnut can hold back the tide.

(Or, if that’s not good enough - just ban promotion entirely. I don’t give a rat’s arse if code was AI generated or artisanally hand-woven onto magnetic cores by Jeff Minter in a kaftan - the real problem is spam, so just stop all promotional posts and the problem goes away.)

curbstickle@anarchist.nexus · edit-2 28 days ago

https://anarchist.nexus /c/[email protected]/p/757813/selfhosted-ai

Please be sure to let me know when the community agrees with you. Or start a new post, if you’d like.

Edit: For your edit, https://anarchist.nexus /c/[email protected]/p/746919/rule-2-clarifications-and-new-rule-proposal and https://anarchist.nexus /c/[email protected]/p/753349/rule-7-adjustment

lambalicious@lemmy.sdf.org · 27 days ago

Did your LLM just have a spasm?