Skip Navigation

InitialsDiceBearhttps://github.com/dicebear/dicebearhttps://creativecommons.org/publicdomain/zero/1.0/„Initials” (https://github.com/dicebear/dicebear) by „DiceBear”, licensed under „CC0 1.0” (https://creativecommons.org/publicdomain/zero/1.0/)MI
Posts
0
Comments
71
Joined
2 yr. ago

  • I initially installed Ollama/OpenWebUI in my HP G4 Mini but it's got no GPU obviously so with 16GB ram I could run 7b models but only 2 or 3 tokens/sec.
    It definitely made me regret not buying a bigger case that could accomodate a GPU, but I ended up installing the same Ollama/OpenWebui pair on my windows desktop with a 3060 12gb and it runs great - 14b models at 15+ tokens/sec.
    Even better, I figured out that my reverse proxy on the server is capable of redirecting to other addresses in my network so now I just have a dedicated subdomain URL for my desktop instance. It's OpenWebUI is now just as accessible remotely as my server's.

  • When on your wifi, try navigating in your browser to your windows computer's address with a colon and the port 11434 at the end. Would look something like this:

    http://192.168.xx.xx:11434/

    If it works your browser will just load the text: Ollama is running

    From there you just need to figure out how you want to interact with it. I personally pair it with OpenWebUI for the web interface

  • Not really sure I understand how these work, do you just feed it a large textual document like a transcript or something, and it turns it into a more machine readable vector format or something?

    Or is it just a much smaller LLM that's more optimized for reading than generating?

  • The problem is big businesses like Temu can bulk ship and still only pay a certain %.

    But it will ruin small businesses who do only small shipments and will now see a flat fee that may be half or more the value of the good.

  • So I googled it and if you have a Pi 5 with 8gb or 16gb of ram it is technically possible to run Ollama, but the speeds will be excruciatingly slow. My Nvidia 3060 12gb will run 14b (billion parameter) models typically around 11 tokens per second, this website shows a Pi 5 only runs an 8b model at 2 tokens per second - each query will literally take 5-10 minutes at that rate:
    Pi 5 Deepseek
    It also shows you can get a reasonable pace out of the 1.5b model but those are whittled down so much I don't believe they're really useful.

    There are lots of lighter weight services you can host on a Pi though, I highly recommend an app called Cosmos Cloud, it's really an all-in-one solution to building your own self-hosted services - it has its own reverse proxy like Nginx or Traefik including Let's Encrypt security certificates, URL management, and incoming traffic security features; it has an excellent UI for managing docker containers and a large catalog of prepared docker compose files to spin up services with the click of a button; it has more advanced features you can grow into using like OpenID SSO manager, your own VPN, and disk management/backups.
    It's still very important to read the documentation thoroughly and expect occasional troubleshooting will be necessary, but I found it far, far easier to get working than a previous Nginx/Docker/Portainer setup I used.

  • Using Ollama depends a lot on the equipment you run - you should aim to have at least 12gb of VRAM/unified memory to run models. I have one copy running in a docker container using CPU on Linux and another running on the GPU of my windows desktop so I can give install advice for either OS if you'd like

  • I'm actually right there with you, I have a 3060 12gb and tbh I think it's the absolute most cost effective GPU option for home use right now. You can run 14B models at a very reasonable pace.
    Doubling or tripling the cost and power draw just to get 16-24gb doesn't seem worth it to me. If you really want an AI-optimized box I think something with the new Ryzen Max chips would be the way to go - like an ASUS ROG Z-Flow, Framework Desktop or the GMKtek option whatever it's called. Apple's new Mac Minis are also great options. Both Ryzen Max and Apple make use of shared CPU/GPU memory so you can go up 96GB+ at much much lower power draws.

  • Qwen 3 coder is the current top dog for coding afaik, there's a 30b size and something bigger but I can't remember what because I have no hope of running it lol. But I think the larger models have up to a million token context window.

  • Sony is even worse than ASUS in regards to length of software and security updates. It hasn't been a deal breaker for me, but I have a Xperia 5 III and at only 3.5 years I am beginning to have issues with my fingerprint reader.
    Apparently it's a fairly widespread issue that moisture can get in the edges and degrade the backing, and they are still to this day using the same fingerprint reader design. I'm pretty irritated about it. I feel like I could get at least another year out of this phone otherwise, but it's surprisingly annoying to have to go back to passwords for app logins.

  • SSO is single sign on, so you don't need individual username and password for every service. It's a bit more advanced so don't worry about it until you have what you want working properly for a while.

    DNS is like the yellow pages of the internet - when you type www.google.com your computer uses a DNS server to look up what actual IP address corresponds to the website name. The point of Adguard or pihole is that when a website tries to load an ad your custom DNS server just says it doesn't recognize the address

  • Check out Cosmos, I struggled piecing things together but when I restarted from scratch with this as the base is has been SO much easier to get services working, while still being able to see how things work under the hood.

    It's basically a docker manager with integrated reverse proxy and OpenID SSO capability, with optional VPN and storage management