Have you considered that this person is just an asshole. I don't even know anyone who uses voice messages, maybe some that use the speech to text feature to send a message. But voice messages aren't a real way to communicate IMO. If they want to talk to me they're gonna have to call me, so I can casually ignore it since I hate answering the phone.
I know its a absolute wonky workaround but you could use a second phone and enable google speech input -- or an FOSS alternative: FUTO Voice Input (Local LLM Model that works pretty great. Better than google imo. Is better finding the correct words and also putting logical punctiation. -- as in when should a comma or dot appear.)
Now you enable speech input on one phone and playback the voice message of the dude on the other end. Now you got all the text.