Don't overlook llama.cpp's rpc-server feature.
ThreeJawedChuck @ ThreeJawedChuck @sh.itjust.works Posts 2Comments 11Joined 3 mo. ago
ThreeJawedChuck @ ThreeJawedChuck @sh.itjust.works
Posts
2
Comments
11
Joined
3 mo. ago
I have to correct myself. It appears newer versions of rpc-server have a cache option and you can point them to a locally stored version of the model to avoid the network cost.