ThreeJawedChuck

3mo ago

Don't overlook llama.cpp's rpc-server feature.

It loads the rpc machine’s part of the model across the network every time you start the server,

I have to correct myself. It appears newer versions of rpc-server have a cache option and you can point them to a locally stored version of the model to avoid the network cost.

3mo ago

Noob experience using local LLM as a D&D style DM.

Jump

Mistral (24B) models are really bad at long context, but this is not always the case. I find that Qwen 32B and Gemma 27B are solid at 32K

It looks like the Harbinger RPG model I'm using (from Latitude Games) is based on Mistral 24B, so maybe it inherits that limitation? I like it in other ways. It was trained on RPG games, which seems to help it for my use case. I did try some general purpose / vanilla models and felt they were not as good at D&D type scenarios.

It looks like Latitude also has a 70B Wayfarer model. Maybe it would do better at bigger contexts. I have several networked machines with 40GB VRAM between all them, and I can just squeak I4Q_XS x 70B into that unholy assembly if I run 24000 context (before the SWA patch, so maybe more now). I will try it! The drawback is speed. 70B models are slow on my setup, about 8 t/s at startup.

3mo ago

Noob experience using local LLM as a D&D style DM.

Jump

Ah, great idea about the low temp for rules and high for creativity. I guess I can easily change it in the front end, although I also set the temp when I start the server, and I'm not sure which one takes priority. Hopefully the frontend does, so I can tweak it easily.

Also your post just got me thinking about the DRY sampler, which I'm using, but might be causing troubles for cases where the model legit should repeat itself, like an !inventory or !spells command. I might try to either disable it or add a custom break token, like the ! mark.

I think ST can show token probabilities, so I'll try that too, thanks. I have so much to learn! I really should try other frontends though. ST is powerful in a lot of ways like dynamic management of the context, but there are other things I don't like as much. It attaches a lot of info to a character that I don't feel should be a property of a character. And all my D&D scenarios so far have been just me + 1 AI char, because even though ST has a "group chat" feature, I feel like it's cumbersome and kind of annoying. It feels like the frontend was first designed around one AI char only, and then something got glued on to work around that limitation.

3mo ago

Noob experience using local LLM as a D&D style DM.

Jump

Will do, thanks for the tip. Your description does sound like a good fit for the idea. As long as it supports network inference between machines with heterogeneous cards, it would work for what I have in mind.

3mo ago

Vote manipulation bots using sh.itjust.works?

Jump

to get the ball rolling usually yeah

I've always been a little skeptical of the up/downvote mechanism on any social media platform. It makes it so easy to weaponize the human bias towards group conformity, by doing exactly what you described.

I won't try to argue the voting has no benefits. It can be helpful to reduce the reach of truly bad faith posts for example, and everyone loves the little dopamine hit of seeing one's own posts upvoted - me included. Just that it also has real drawbacks, and I'm not sure whether the good outweighs the bad.

3mo ago

Despite record gaming revenues, Nvidia reportedly plans to cut RTX 50 series production to allocate it to new AI hardware

Jump

3mo ago

Noob experience using local LLM as a D&D style DM.

Jump

Or be a 90s computer text adventure

Zork on steroids!

3mo ago

Noob experience using local LLM as a D&D style DM.

Jump

What was your setup for this experiment?

I'm using llama.cpp + sillytavern. I'm very much in learning mode with ST however, so I'm confident I could be using it in a more effective manner than I know how to at the moment. It seems like koboldcpp + ST ought to be similar to what I'm doing.

3mo ago

Noob experience using local LLM as a D&D style DM.

Jump

but it’s no fun when the LLM simply says “yeah, sure whatever.” I

I hear ya. LLMs tend to heavily tilt toward what the user wants, which is not ideal for an RPG.

Have you tried any of the specialized RPG models? The one I'm using now has, at least twice so far, put me into a situation where I felt my party (2 chars, me and the AI) were going to die unless we ran away. We just finished a very difficult fight, used everything at our disposal, and sustained several serious injuries in the process. Then an even more powerful foe appeared, and it really felt that was going to be the end unless we ran. Would it really have killed us? I can't say, but I did get a genuine sense of that. It might help that in the system prompt, I had put this:

The story should be genuinely dangerous and frightening, but survivable if we use our wits.

I have the feeling the generalist models are much more tilted in the "yeah, sure, whatever" direction. I tried at least one RPG focused model (Dan's dangerous winds, or something like that) which was downright brutal, and would kill me off right away with no opportunity for me to do anything about it. That wasn't fun for the opposite reason. But like you say, it's also not fun to have no risk and no boundaries to test one's mettle. The sweet spot is can be elusive.

I'm thinking that a non-LLM rules system around an LLM for descriptive purposes could really help here too, to enforce a kind of rigor on the experience.

3mo ago

How to use GPUs over multiple computers for local AI?

Jump

Coincidentally, I have just been trying this using the llama.cpp server and two machines on my local LAN.

I made a post about it in https://sh.itjust.works/c/localllama. I'm brand new to lemmy (literal hours) so I'll probably do this all wrong, but maybe this is a link to my post? https://sh.itjust.works/post/39137051 I'm a little confused about posting links in this federated system, but I hope that works. The upshot is that I got it working fine across two machines, and it was easy to set up, but it has a few minor (to me) drawbacks.

3mo ago

Don't overlook llama.cpp's rpc-server feature.

Jump

To add to my lame noob answer, I found this, which has a better rundown of ollama vs llama.cpp. I don't know if it's considered bad form to link to ##ddit on lemmy, so ~~I'll just put the title here and you can search for it on there if you want~~ link added per comment from mutual_ayed below. There are a couple informative posts which are upvoted. "There is a big difference between use LM-Studio, Ollama, LLama.cpp?"

3mo ago

Don't overlook llama.cpp's rpc-server feature.

Jump

What’s the advantage over Ollama?

I'm very new to this so someone more knowledgeable should probably answer this for real.

My impression was that ollama somehow uses the llama.cpp source internally, but wraps it up to provide features like auto-downloading of models. I didn't care about that, but I liked the very tiny dependency footprint of llama.cpp. I haven't tried ollama for network inference.

There are other backends too which support network inference, and some posts allege they are better for that than llama.cpp is. vllm and ... exllama or something like that? I haven't looked into either of them. I'm running on inertia so far with llama.cpp, since it was so easy to get going and I'm kinda lazy.