Instead of using character ai, which will send all my private conversations to governments, I found this solution. Any thoughts on this? 😅

  • tal@lemmy.today
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    22 days ago

    I’ve run Kobold AI on local hardware, and it has some erotic models. From my fairly quick skim of character.ai’s syntax, I think that KoboldAI has more-powerful options for creating worlds and triggers. KoboldAI can split layers across all available GPUs and your CPU, so if you’ve got the electricity and the power supply and the room cooling and are willing to blow the requisite money on multiple GPUs, you can probably make it respond about as arbitrarily-quickly as you want.

    But more-broadly, I’m not particularly impressed with what I’ve seen of sex chatbots in 2025. They have limited ability to use conversation tokens from earlier in the conversation in generating each new message, which means that as a conversation progresses, it increasingly doesn’t take into account content earlier in the conversation. It’s possible to get into loops, or forget facts about characters or the environment that were present earlier in a conversation.

    Maybe someone could make some kind of system to try to summarize and condense material from earlier in the conversation or something, but…meh.

    As generating pornography goes, I think that image generation is a lot more viable.

    EDIT:

    KoboldAI has the ability to prefix the current prompt with a given sentence if the prompt contains a prompt term that matches, which permits dumping information about a character into each prompt. For example, one could have a trigger such that “I asked Jessica to go to the store”, one could have a trigger that matches on “Jessica” and contains “Jessica is a 35-year-old policewoman”. That’d permit providing static context about the world. I think that maybe what would need to happen is to have a second automated process trying in the background to summarize and condense information from earlier in the conversation about important prompt words, and then writing new triggers attached to important prompt terms, so that each prompt is sent with a bunch of relevant information. Manually-writing static data to add context faces some fundamental limits.

    • fishynoob@infosec.pub
      link
      fedilink
      English
      arrow-up
      2
      ·
      22 days ago

      I had never heard of Kobold AI. I was going to self-host Ollama and try with it but I’ll take a look at Kobold. I had never heard about controls on world-building and dialogue triggers either; there’s a lot to learn.

      Will more VRAM solve the problem of not retaining context? Can I throw 48GB of VRAM towards an 8B model to help it remember stuff?

      Yes, I’m looking at image generation (stable diffusion) too. Thanks

      • tal@lemmy.today
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        22 days ago

        Will more VRAM solve the problem of not retaining context?

        IIRC — I ran KoboldAI with 24GB of VRAM, so wasn’t super-constrained – there are some limits on the number of tokens that can be sent as a prompt imposed by VRAM, which I did not hit. However, there are also some imposed by the software; you can only increase the number of tokens that get fed in so far, regardless of VRAM. More VRAM does let you use larger, more “knowledgeable” models, as well as putting more layers on a given GPU.

        I’m not sure whether those are purely-arbitrary, to try to keep performance reasonable, or if there are other technical issues with very large prompts.

        It definitely isn’t capable of keeping the entire previous conversation (once you get one of any length) as an input to generating a new response, though.

        EDIT: I think that last I looked at KoboldAI — I haven’t run it recently — the highest token count per prompt one could use was 2048, and this seems to mesh with that:

        https://www.reddit.com/r/KoboldAI/comments/yo31hj/can_i_get_some_clarification_on_some_things_that/

        The 2048 token limit of KoboldAI is set by pyTorch, and not system memory or vram or the model itself

        So basically, each response is being generated looking at a maximum of 2048 words for knowledge about the conversation and your characters and world. Other knowledge has to come from the model, which can be trained on a ton of — for sex chatbots — erotic text and literature, but that’s unchanging; it doesn’t bring any more knowledge as regards your particular conversation or environment or characters that you’ve created.

        • fishynoob@infosec.pub
          link
          fedilink
          English
          arrow-up
          2
          ·
          22 days ago

          I see. Thanks for the note. I think beyond 48GB of VRAM diminishing returns set in very quickly so I’ll likely stick to that limit. I wouldn’t want to use models hosted in the cloud so that’s out of the question.

    • fishynoob@infosec.pub
      link
      fedilink
      English
      arrow-up
      2
      ·
      edit-2
      22 days ago

      Thanks for the edit. You have a very intriguing idea; a second LLM in the background with a summary of the conversation + static context might make performance a lot better. I don’t know if anyone has implemented it/knows how one can DIY it with Kobold/Ollama. I think it is an amazing idea for code assistants too if you’re doing a long coding session.