Skip Navigation
Hacker News @lemmy.smeargle.fans

70B Llama 2 at 35tokens/second on 4090

github.com

GitHub - turboderp/exllamav2: A fast inference library for running LLMs locally on modern consumer-class GPUs

TechNews @radiation.party

70B Llama 2 at 35tokens/second on 4090

Hacker News @derp.foo

70B Llama 2 at 35tokens/second on 4090

LocalLLaMA @sh.itjust.works

Exllama V2 released! Available in Ooba! Big speed upgrades!

0 comments

No comments