Instead of using character ai, which will send all my private conversations to governments, I found this solution. Any thoughts on this? 😅

  • fishynoob@infosec.pub
    link
    fedilink
    English
    arrow-up
    2
    ·
    23 days ago

    I had never heard of Kobold AI. I was going to self-host Ollama and try with it but I’ll take a look at Kobold. I had never heard about controls on world-building and dialogue triggers either; there’s a lot to learn.

    Will more VRAM solve the problem of not retaining context? Can I throw 48GB of VRAM towards an 8B model to help it remember stuff?

    Yes, I’m looking at image generation (stable diffusion) too. Thanks

    • tal@lemmy.today
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      23 days ago

      Will more VRAM solve the problem of not retaining context?

      IIRC — I ran KoboldAI with 24GB of VRAM, so wasn’t super-constrained – there are some limits on the number of tokens that can be sent as a prompt imposed by VRAM, which I did not hit. However, there are also some imposed by the software; you can only increase the number of tokens that get fed in so far, regardless of VRAM. More VRAM does let you use larger, more “knowledgeable” models, as well as putting more layers on a given GPU.

      I’m not sure whether those are purely-arbitrary, to try to keep performance reasonable, or if there are other technical issues with very large prompts.

      It definitely isn’t capable of keeping the entire previous conversation (once you get one of any length) as an input to generating a new response, though.

      EDIT: I think that last I looked at KoboldAI — I haven’t run it recently — the highest token count per prompt one could use was 2048, and this seems to mesh with that:

      https://www.reddit.com/r/KoboldAI/comments/yo31hj/can_i_get_some_clarification_on_some_things_that/

      The 2048 token limit of KoboldAI is set by pyTorch, and not system memory or vram or the model itself

      So basically, each response is being generated looking at a maximum of 2048 words for knowledge about the conversation and your characters and world. Other knowledge has to come from the model, which can be trained on a ton of — for sex chatbots — erotic text and literature, but that’s unchanging; it doesn’t bring any more knowledge as regards your particular conversation or environment or characters that you’ve created.

      • fishynoob@infosec.pub
        link
        fedilink
        English
        arrow-up
        2
        ·
        23 days ago

        I see. Thanks for the note. I think beyond 48GB of VRAM diminishing returns set in very quickly so I’ll likely stick to that limit. I wouldn’t want to use models hosted in the cloud so that’s out of the question.