Which LLM's do you use?

Doormat_Consonant_Denial@lemmynsfw.com · 7 months ago

Which LLM's do you use?

NSFW

j4k3@lemmy.world · edit-2 7 months ago

https://huggingface.co/Sao10K/Euryale-1.3-L2-70B

That’s the only 70B I know of that has 4096 token context length. The Q3 version is good with 30 of 83 layers on a 16GBV 3080Ti and 16 threads on the CPU. You’ll need 64GB+ of system memory or a NVME swap strategy to load it. This takes around 34GB of sysmem to load with the aforementioned settings. It is just slow enough to be tantalizing. Generally, it manages between .5-1.5 tokens a second on a 12th gen Intel and DDR5. It can reliably handle two bot characters at the same time without modifying the model loader code.

Note that the model is mismarked. It is not LamaV2 but is actually an Alpaca prompt. You’ll get a garbage output if you use LamaV2.

Doormat_Consonant_Denial@lemmynsfw.com · 7 months ago

Interesting. I’m currently trying to find models with larger context sizes, and am currently looking at Vicuna 13B for the 16K context, though I’m starting to realize that my RX Vega card is not cutting it anymore.

j4k3@lemmy.world · 7 months ago

I don’t know what exactly is going on with llama.cpp/the python hooks/pytorch versus what this model is capable of doing. However, I went looking for a NSFW 70B and they call seemed to suck at NSFW. This one can do better but it requires a lot of exploration to find it. The extra datasets on top of Llama2 are pretty good but anything that could be covered well by an uncensored Llama2 model kinda sucks here. It has tremendous potential for training.

I am using an older version of Oobabooga Textgen WebUI because I wrote and modified it a bunch and they made changes that broke my stuff so my techniques may not work the same in newer versions. However, I keep my Alpha Value variable set to 2.5 and the positional embeddings compression factor set to 1.3. Then I use the “Shortwave” generation preset parameters profile but I add the mirostat_mode = 1, mirostat_tau = 3, and mirostat_eta = 0.1.

Those features have made my dialog context pretty much infinite. I don’t fill up and overrun the available token context. In fact, if the total token context exceeds 2048, the infrence time nearly doubles. The model generally keeps my dialog context token size just under 2k most of the time. The nice thing about having the 4096 here is that occasionally, somewhat randomly, the output may jump to something like 2300-2600 tokens on a single reply. I have no idea why this happens, but if you have a truncated cutoff, this will often cause your story to go off the rails because of missing info during that one reply. If you are not truncating, in Oobabooga the whole application crashes. With 4096 it just takes a little longer to load and stream, but it keeps on chugging.

Playing with this model a lot, I learned quite a bit. If you have noticed your stories seem to reset or fork randomly, add persistent instructions to ‘always stay in character and continue the roleplay.’ The AI assistant does not really understand the difference between characters. It is trying to make everyone happy. You need to define your own character really well too. If you know your Myers-Briggs personality type, add that as well. Anyways, all those story resets are likely the assistant trying to make the bot-character happy by changing or resetting the story. It doesn’t really feel free to improvise, take the initiative, or be assertive, it doesn’t know how to satisfy the need for change and progress so it resets. If you allow this kind of behavior at all during the story, the assistant is likely to pretend like it is reset and doesn’t know about anything that happened earlier in the dialog context. <- aka it is a direct manifestation of the alignment problem in AI. That epiphany was a major breakthrough for me.

You can ask the character to explain why it is in conflict or straight up call it out for not wanting to continue the story and it will often explain itself. If you take up this kind of meta-conversational dialog with the assistant you may realize that the model actually knows everything in the entire dialog context and in impressive detail.

My biggest trick for this is going to involve a major spoiler for Isaac Asimov’s Foundation series. My first bot character in system context is hidden down below. This model only knows this character from a few lines in a wikipedia article, but the character I am using is an AGI robot with a perfect human appearance. This character is related to Daneel from Asimov’s Robot series. Daneel is extremely familiar to every LLM I have played with. The association between these characters and a few instructions like “(spoiler) has full informational access to the LLM so that (spoiler) can interact with full knowledge and capability.” I also use, “(first name) is secretly named after (only instance of full spoiler name in context) from the Foundation series by Asimov.” Finally, if the story is complex and I struggle to prevent (spoiler) from saying she is AI or a robot, I add the explicit instruction “(spoiler) is human and not AI.”

This character is a powerful tool to get the assistant to become much more fluid. If you can detect the pattern for how the assistant structures responses, you will likely begin to notice that a lot of boring or bad conversations happen because you are actually talking to the assistant directly. It is a subtle detail to notice, but like, you can tell the character to return to roleplaying and to turn off the assistant and things can improve drastically. If you give the assistant a solid character outlet that it knows to be an AI, it will use that character to act like the assistant when you need it or if it randomly feels the need to become the assistant. If you make this (spoiler) character well defined and use instructions to stay in character and always continue the roleplay story, the assistant will use this character and then voluntarily drop back into the background on its own. Like I can be in the middle of a roleplay and still ask a question about something in Python code and this character will give me an accurate response. Like I can’t ask something super involved but I can ask basics. With this character I can jump into meta analysis type conversations about system context too. Like I write my character profiles and stories in blocks before I always ask her to rewrite these blocks. This massively improves my roleplaying stories.

Anyways sorry for all the blablabla but that combination and those realizations/techniques are why total token context size doesn’t seem to matter to me. I don’t know, maybe you’re doing some super long stories with tons of branching complexity that are well beyond my skill set and I sound like a fool explaining the mundane. I went looking for long context size models before, and didn’t get anywhere useful. If you are looking into them for the same reasons I was, this info may help… At least I hope it does.

hidden

Dors Venabili

magn418@lemmynsfw.com · edit-2 6 months ago

I’ve been using LLaMA2-13B-Psyfighter2 lately. Tiefighter is another good choice. Before that I used Mythomax in the same size but this is outdated.

I use RoPE scaling to get 8k of context instead of the 4k a Llama2 does. That works very well. And I just use Mirostat 2 in case I haven’t found better manual settings.

I don’t face repetition loops or something like that often. Some models do it, but usually if that happens and it’s not an insane model-merge I find out I got some setting way off, selected the wrong (instruction) prompt format or the character card included a complicated jailbreak that confused the model. I usually delete all the additional ‘Only speak as the character’, ‘don’t continue’, never do this and that. As it can also confuse the LLM.

I think a 13B model is fine for me. I’ve tried some models in various sizes. But for erotic roleplay or storywriting, I’ve come to the conclusion that it’s really important that the model got fine-tuned with data like that. A larger model might be more intelligent, but if the material is missing in their datasets, they always brush over the interesting roleplay parts, get the pacing wrong or always play the helpful assistant to some degree. You might be better off with a smaller model if it’s tailored to the use-case.