Selfhosted & AI

curbstickle@anarchist.nexus · 1 month ago

Selfhosted & AI

midribbon_action@lemmy.blahaj.zone · edit-2 28 days ago

I don’t think you need hardly any hardware to do ocr. USPS started doing reliable ocr on 80s hardware. You really think an ai cluster is necessary for that?

Anyways, cool anecdote, not an actual financial study or report, and very long-winded honestly.

Post-edit reply: wow, that’s kinda fucked up not to disclose that they disassembled it already. Looks like they found better uses. That’s your success story?

curbstickle@anarchist.nexus · 28 days ago

OCR <> data ingest

OCR wouldn’t work, as I mentioned, because of the varying structures of the forms.

I’m sorry my answer was too “long winded” for you, I was trying to be informative, but clearly you aren’t interested in that. Enjoy your day.

midribbon_action@lemmy.blahaj.zone · 28 days ago

Don’t think that’s true. You can run the whole form through, come out with an identical pdf with searchable/copyable text. Even a completely novel form uses the same alphabet. Add some regex to pull out the fields you need to enter, and on failure give it to a human. All of that can be done with python on a raspberry pi. A decade ago.

https://github.com/ocrmypdf/OCRmyPDF

curbstickle@anarchist.nexus · 28 days ago

You’d be wrong.

The fields aren’t all the same kinds of values, which requires relationship between the data to be evaluated for entry.

You’re assuming this is transposing contents, which was not the issue. Your example is what was initially planned and halted before transitioning to the approach I helped deploy.

midribbon_action@lemmy.blahaj.zone · edit-2 26 days ago

That’s wrong, you didn’t know that there’s another if/else statement required by them. That’s what the supercomputer is for.

That’s how you sound.

Edit: [Completely new information to me, completely different justification for using a supercomputer over a raspberry pi, now the third attempt (1 it required ocr, 2 forms have different structures, 3 logical relationships between data (if/else statements)).]

curbstickle@anarchist.nexus · 28 days ago

So I’ll go back to my previous comment; you’re not actually interested in understanding the use, you have a pre-determined (and uninformed) view of use and operation, and providing that information as an example is “long-winded”.

Ill be done with this discussion now. Enjoy your day.

midribbon_action@lemmy.blahaj.zone · edit-2 26 days ago

The difference is that each person didnt need to hunt across the form to find the details. When the comparison comes up for approval at each stage, they get the snippet being brought in and the field its being applied to.

This is the only technical detail [about the need for an llm cluster] in the whole 500 word comment.

Edit: []