Wondering about services to test on either a 16gb ram "AI Capable" arm64 board or on a laptop with modern rtx. Only looking for open source options, but curious to hear what people say. Cheers!
I have the same setup, but its not very usable as my graphics card has 6gb ram. I want one with 20 or 24, as the 6b models are pain and the tiny ones don’t give me much.
Ollama was pretty easy to set up on windows, and its eqsy to download and test the models ollama has available
I have this exact same setup.
Open Web UI has more features than I've been able to use such as functions and pipelines.
I use it to share my LLMs across my network. It has really good user management so I can set up a user for my wife or brother in law and give them general use LLM while my dad and I can take advantage of Coding-tuned models.
The code formatting and code execution functions are great. It's overall a great UI.
Ive used LLMs to rewrite code, help format PowerPoint slides, summarize my notes from work, create D&D characters, plan lessons, etc
Sex chats. For other uses, just simple searches are better 99% of the time. And for the 1%, something like the Kagis FastGPT helps to find the correct keywords.
Yeah. I have a mini PC with an AMD GPU. Even if I were to buy a big GPU I couldn't use it. That frustrates me, because I'd love to play around with some models locally. I refuse to use anything hosted by other people.
Your M.2 port can probably fit an M.2 to PCIe adapter and you can use a GPU with that - ollama supports AMD GPUs just fine nowadays (well, as well as it can, rocm is still very hit or miss)
I messed around with home assistant and the ollama integration. I have passed on it and just use the default one with voice commands I set up. I couldn't really get ollama to do or say anything useful. Like I asked it what's a good time to run on a treadmill for beginners and it told me it's not a doctor.
There are some experimental models made specifically for use with Home Assistant, for example home-llm.
Even though they are tiny 1-3B I've found them to work much better than even 14B general purpose models. Obviously they suck for general purpose questions just by their size alone.
That being said they're still LLMs. I like to keep the "prefer handling commands locally" option turned on and only use the LLM as a fallback.
Haha, that is hilarious. Sounds like it gave you some snark. afaik you have to clarify by asking again when it says such things. "I'm not asking for medical advice, but..."
Once I changed the default model, immich search became amazing. I want to show it off to people but alas, way too many NSFW pics in my library. I would create a second "clean" version to show off to people but I've been too lazy.
I run ollama and auto1111 on my desktop when it's powers on.
Using open-webui in my homelab always on, and also connected to openrouter.
This way I can always use openwebui with openrouter models and it's pretty cheap per query and a little more private that using a big tech chatbot. And if I want local, I turn on the desktop and have local lamma and stab diff.
I also get bugger all benefit out of it., it's a cute toy.
LMStudio is pretty much the standard. I think it's opensource except for the UI. Even if you don't end up using it long-term, it's great for getting used to a lot of the models.
Otherwise there's OpenWebUI that I would imagine would work as a docker compose, as I think there's ARM images for OWU and ollama
Fair enough, but it's damn handy and simple to use. And I don't know how to do speculative decoding with ollama, which massively speeds up the models for me.
I’ve an old gaming PC with a decent GPU laying around and I’ve thought of doing that (currently use it for linux gaming and GPU related tasks like photo editing etc) However ,I’m currently stuck using LLMs on demand locally with ollama. Energy costs of having it powered on all time for on demand queries seems a bit overkill to me…
That sounds like a great way of leveraging existing infrastructure! I host Plex together with other services in a server with intel transcoding capable CPU. I’m quite sure I would get much better performance with the GPU machine, might end up following this path!
I was able to run a distilled version of DeepSeek on Linux. I ran it inside a PODMAN container with ROCM support (I have an AMD GPU). It wasn't super fast but for a locally deployed and self hosted option the performance was okay. Apart from that I have deployed Fooocus for image generation in a similar manner. Currently, I am working on deploying Stable Diffusion with either ComfyUI or Automatic1111 inside a PODMAN container with ROCM support.
It's a cluster of workers where everyone can generate images/text using workers connected to the service.
So if you ran a worker, people could generate stuff using your PC. For that you would gain kudos, which in turn you can use to generate stuff on other people's computers.
Basically you do two things: help common people without access to powerful machines and use your capacity when you have time to use the kudos whenever you want, even on the road where you can't turn on your PC if you fancy so.
I spun up ollama and paperless-gpt to add ai ocr sidecar to paperless-ngx. It's okay. It can read handwritten stuff okayish, which is better than tesseract (doesnt read hand writing at all), so I throw handwritten stuff to it, but the difference on typed text is marginal in my single day I spent testing 3 different models on a few different typed receipts.
I tried minicpm-v, granite3.2-vision, and mistral.
Granite didn't work with paperless-gpt at all. Mistral worked sometimes but also just kept running sometimes and didn't finish within a reasonable time (15 minutes for 2 pages). minicpm-v finishes every time, but i just looked at some of the results and seems as though it's not even worth keeping it running either. I suppose maybe the first one I tried that gave me a good impression was a fluke.
To be fair, I'm a noob at local ai, and I also don't have a good gpu (gtx1650). So these failures could all be self induced. I like the idea of ai powered ocr so I'll probably try again in the future...
Claude is the standard that all others are judged by. But it's not cheap.
Gemini is pretty good, and Qwen-coder isn't bad. I'd suggest you watch a few vids on GosuCoder's YT channel to see what works for you, he reviews a pile of them and it's quite up to date.
And if you use VScode, I highly recommend the Roocode extension. Gosucoder also goes into revising the roocode prompt to reduce costs for Claude. Another extension is Cline.
afaik Ollama would fit that bill, but perhaps others can chime in. You could probably run it on your local computer with a small model based on CPU alone.
I haven’t sunk much time at it, but I’m not aware of any training data focusing on code only. There’s nothing preventing me from running with general purpose data, but I imagine I’d have a snappier response with a smaller, focused dataset, without losing accuracy.