• MacN'Cheezus@lemmy.today
    link
    fedilink
    English
    arrow-up
    3
    ·
    5 months ago

    Llava and Bakllava are two Ollama models than can not only extract text but also describe what’s happening on screen.

    Using tesseract-ocr, as the other guy suggested, is probably simpler and less resource intensive though.