• brucethemoose@lemmy.world
    link
    fedilink
    arrow-up
    31
    arrow-down
    1
    ·
    edit-2
    9 hours ago

    There’s a ‘meme’ trend of local ML tinkerers messing with the Epstein files as a dataset: https://huggingface.co/datasets/tensonaut/EPSTEIN_FILES_20K/

    See: text embeddings https://huggingface.co/datasets/svetfm/epstein-files-nov11-25-house-post-ocr-embeddings


    Edit: Now I’m pondering making an “EpsteinGPT” finetune myself. Maybe like a 4B-14B model for the sole purpose of Epstein RAG? Or a 32B responding in the style of the Epstein email text, just because.

    • khepri@lemmy.world
      link
      fedilink
      arrow-up
      14
      ·
      9 hours ago

      Just imagine having to explain being in possession of a handmade “EpsteinGPT” to someone 🤣🤣

      • brucethemoose@lemmy.world
        link
        fedilink
        arrow-up
        1
        ·
        edit-2
        8 hours ago

        Meme finetunes are nothing new.

        As an example, there are DPO datasets with positive/negative examples intended to train LLMs to respond politely and helpfully (as opposed to the negative response). There are some that include toxic comments plucked from the web as negative examples.

        And the immediate community thought was “…What if I reversed them?”

        • khepri@lemmy.world
          link
          fedilink
          arrow-up
          3
          ·
          edit-2
          8 hours ago

          haha just imaging people showing off their collections, “here’s my Mr. Rogers chatbot, and Thomas Jefferson, and even Luffy from One Piece! And uh…oh yeah over here we have EpsteinGPT for when I, I mean for if, um…its for lulz ok?! Don’t look at me like that, where are you going?!”

          • brucethemoose@lemmy.world
            link
            fedilink
            arrow-up
            2
            ·
            edit-2
            8 hours ago

            It’s literally “this one is my fursona. This one won’t refuse BDSM, but its not as eloquent. Oh, this one is lobotimized but really creative.” I kid you not. Here is an example, and note that is one of 115 uploads from one account:

            https://huggingface.co/Mawdistical/RAWMAW-70B?not-for-all-audiences=true

            And I love that madness. It feels like the old internet. In fact, furries and horny roleplayers have made some good code contributions to the space.


            Early on, there were a few ‘character’ finetunes or more generic ones like ‘talk like a pirate’ or ‘talk only in emojiis.’ But as local models got more advanced, they got so good at adopting personas that the finetuning focused more on writing ‘style’ and storytelling than emulating specific characters. For example, one trained specifically to stick to the role of a dungeonmaster: https://huggingface.co/LatitudeGames/Nova-70B-Llama-3.3

            Or this one, where you can look at the datasets and see the anime ‘style’ they’re trying to massage in: https://huggingface.co/zerofata/GLM-4.5-Iceblink-106B-A12B

    • lemming741@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      9 hours ago

      Instead of em dashes, it’s full of extra spaces before and after each period and extra period.

      • brucethemoose@lemmy.world
        link
        fedilink
        arrow-up
        1
        ·
        edit-2
        8 hours ago

        I dunno what the ‘writing style’ would end up as. The bulk of the text seems to be formatted like this:

        ...
        10. Is Epstein cooperating with federal suit against Bear Stearns hedge fund managers Ralph Cioffi
        and Matthew Tannin? Will he testify in their cases?
        
        11. Mr Epstein was deposed on this week, on Thursday. Is it true that he answered almost every
        question by invoking his Fifth Amendment rights?
        
        12. Defense attorney Brad Evans has filed a motion to freeze Mr Epstein’s assets. Has Mr.
        Epstein moved his money from the US offshore or abroad, or does he intend to, in order to
        protect his assets from possible damage claims?
        
        13. What did Mr. Epstein do during his work release program while serving time. Reports have
        said he engaged in “scientific research.” If so, what was he researching?
        ...
        
        Response
        
        "That's because it isn't, and everyone here
        (apparently save one) is rational and objective enough
        to understand that. Physical phenomena, and
        phenomena in general, are
        
        ultimately perceptual in nature and subject to
        observational replication - that's why they call
        physics an empirical science. But consciousness is
        not.
        
        Consciousness cannot be objectively, replicably
        observed. Its putative physical correlates, including
        ...
        
        Bill Clinton identified in lawsuit against his former friend and
        pedophile Jeffrey Epstein who had 'regular' orgies at his Caribbean
        compound that the former president visited multiple times
        
        e The former president was friends with Jeffrey Epstein, a financier who was arrested
        in 2008 for soliciting underage prostitutes
        
        e Anew lawsuit has revealed how Clinton took multiple trips to Epstein's private island
        where he 'kept young women as sex slaves'
        
        e Clinton was also apparently friends with a woman who collected naked pictures of
        underage girls for Epstein to choose from
        
        e He hasn't cut ties with that woman, however, and invited her to Chelsea's wedding
        
        e Comes as friends now fear that if Hillary Clinton runs for president in 2016, all of
        their family's old scandals will be brought to the forefront
        
        e Epstein has a host of famous friends including Prince Andrew who stayed at his New
        
        York mansion AFTER his arrest
        By Daily Mail Reporter
        Published: 09:06 EST, 19 March 2014 | Updated: 21:10 EST, 5 January 2015
        

        I’d have to generate prompt/response wrappers too. But it would definitely bring up Trump and Clinton randomly, heh.

        …There are automated metrics to rank English text by reading level, ‘quality’ and such. I guess it could be filtered to most ‘interesting’ emails and reformatted.

        • lemming741@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          6 hours ago

          Ah I misinterpreted as the most recent email dump. Like you could email back and forth with an avatar of jee