High-Speed Large Language Model Serving on PCs with Consumer-Grade GPUs
High-Speed Large Language Model Serving on PCs with Consumer-Grade GPUs

github.com
GitHub - SJTU-IPADS/PowerInfer: High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

There is a discussion on Hacker News, but feel free to comment here as well.