70B Llama 2 at 35tokens/second on 4090
70B Llama 2 at 35tokens/second on 4090

github.com
GitHub - turboderp/exllamav2: A fast inference library for running LLMs locally on modern consumer-class GPUs

70B Llama 2 at 35tokens/second on 4090
GitHub - turboderp/exllamav2: A fast inference library for running LLMs locally on modern consumer-class GPUs