Skip Navigation

Models for 16 GB vram?

What models are currently good for running coding tasks? I just ran Qwen3-14B-Q6_K.gguf with llama.cpp on my card with 16GB of vram (+32GB ddr4), but I get really close to filling the entire vram on a single short conversation, so I am looking for some (smaller) alternatives to test.

I might throw OpenCode container in the mix next, if that is relevant information.

6 comments
6 comments