As others have already mentioned, try qwen2.5-cider. With 16 GB, you should be able to confortably fit a quantised version of the 14b variant into VRAM. You can also try the 32b variant, but it will be much slower because not all layers can be off-loaded to the GPU.
in general, you would want something fast (probably something that fits in your GPU/VRAM) so you can get suggestions as fast as you can type.
for chat, you'll probably want the most intelligent/lorgest model you can run, it's likely fine if it's running on the CPU/RAM since the quality of an individual answer is more important than the speed in which many small answers can be generated.
so, probably qwen for both, but, different sizes/quant for different use cases.
I have an RX 6700 XT and I needed to change an environment variable to make it work. Maybe something similar is needed for you GPU. I'd try googling something like "RX 9700 XT ROCM" or "RX 9700 XT ROCM no compatible GPUs were discovered" if you haven't done that already.
Update:
After updating to the latest kernel (6.14), and removing the old amdgpu drivers that i manually installed. I am now running on my GPU. I'm running deepseek coder 33B, and it generates approx 6 words/second.
I am running local models only for privacy sensitive stuff. If you have ollama you can also setup openwebui and access both local and remote models through the same very nice interface! Also chatgpt API is much cheaper than subscribing.