Should be able to load the full version of DeepSeek R1 on this no prob ππ
6 comments
This is a good time to ask: I want to use AI on a local server (deepseek maybe, image generators like flux, ...) is there a cheaper alternative to flagship Nvidia cards which can do it?
Depends on your goals. For raw tokens per second, yeah you want an Nvidia card with enough(tm) memory for your target model(s).
But if you don't care so much for speed beyond a certain amount, or you're okay sacrificing some speed for economy, AMD RX7900 XT/XTX or 9070 both work pretty well for small to mid sized local models.
Otherwise you can look at the SOC type solutions like AMD Strix Halo or Nvidia DGX for more model size at the cost of speed, but always look for reputable benchmarks showing 'enough' speed for your use case.
From my reading, if you don't mind sacrificing speed (tokens/sec), you can run models in system RAM. To be usable though, you'd need at a minimum a dual proc server/workstation for multichannel RAM and enough RAM to fit the model
So for something like DS R1, you'd need like >512GB RAM
Assuming you haven't ruled this out already, test your plans out now using whatever computer you already own. At the hobbyist level you can do a lot with 8GB ram and no graphics card. 7B LLMs are really good now and they're only going to get better.
Wow. Doing some spring-cleaning? I might have one of those on my own small pile of e-waste. Can't even remember what kind of bandwith the PCI bus had... probably enough to fill 128MB.
Lol I was digging for other parts in my "hardware archive" and came across this, I had actually forgotten about the non-express PCI and thought it was AGP for a minute LMAO
This is a good time to ask: I want to use AI on a local server (deepseek maybe, image generators like flux, ...) is there a cheaper alternative to flagship Nvidia cards which can do it?
Depends on your goals. For raw tokens per second, yeah you want an Nvidia card with enough(tm) memory for your target model(s).
But if you don't care so much for speed beyond a certain amount, or you're okay sacrificing some speed for economy, AMD RX7900 XT/XTX or 9070 both work pretty well for small to mid sized local models.
Otherwise you can look at the SOC type solutions like AMD Strix Halo or Nvidia DGX for more model size at the cost of speed, but always look for reputable benchmarks showing 'enough' speed for your use case.
From my reading, if you don't mind sacrificing speed (tokens/sec), you can run models in system RAM. To be usable though, you'd need at a minimum a dual proc server/workstation for multichannel RAM and enough RAM to fit the model
So for something like DS R1, you'd need like >512GB RAM
Assuming you haven't ruled this out already, test your plans out now using whatever computer you already own. At the hobbyist level you can do a lot with 8GB ram and no graphics card. 7B LLMs are really good now and they're only going to get better.