Have a look at Home Assistant! It’s a great open source smart home platform that recently released a local (so not processing requests in the cloud) voice assistant. It’s pretty neat!
I have one big frustration with that: Your voice input has to be understood PERFECTLY by TTS.
If you have a "To Do" list, and speak "Add cooking to my To Do list", it will do it! But if the TTS system understood:
Todo
To-do
to do
ToDo
To-Do
...
The system will say it couldn't find that list. Same for the names of your lights, asking for the time,..... and you have very little control over this.
HA Voice Assistant either needs to find a PERFECT match, or you need to be running a full-blown LLM as the backend, which honestly works even worse in many ways.
They recently added the option to use LLM as fallback only, but for most people's hardware, that means that a big chunk of requests take a suuuuuuuper long time to get a response.
I do not understand why there's no option to just use the most similar command upon an imperfect matching, through something like the Levenshtein Distance.
I didn't even know this was a feature. My understanding has always been that Echo devices work as follows.
Store a constant small buffer of the past few seconds of audio
Locally listen for the wake word (typically "Alexa") using onboard hardware. (This is why you cannot use arbitrary wake words.)
Upon hearing the wake word, send the buffer from step one along with any fresh audio to the cloud to process what was said.
Act on what was said. (Turn lights on or off, play Spotify, etc.)
Unless they made some that were able to do step 3 locally entirely I don't see this as a big deal. They still have to do step 4 remotely.
Also, while they may be "always recording" they don't transmit everything. It's only so if you say "Alexaturnthelightsoff" really fast it has a better chance of getting the full sentence.
I'm not trying to defend Amazon, and I don't necessarily think this is great news or anything, but it doesn't seem like too too big of a deal unless they made a lot of devices that could parse all speech locally and I didn't know.
If you traveled back in time and told J. Edgar Hoover that in the future, the American public voluntarily wire-tapped themselves, he would cream his frilly pink panties.
How disheartening. I knew going in that there would be privacy issues but I figured for the service it was fine. I also figure my phone is always listening anyway.
As someone with limited mobility, my echo has been really nice to control my smart devices like lights and TV with just my voice.
Are there good alternatives or should I just accept things as they are?
There aren't any immediate drop in replacements that won't require some work, but there is Home Assistant Voice - It just requires that you also have a Home Assistant server setup, which is the more labor intensive part. It's not hard, just a lot to learn.
They have doorbells to watch who comes to your house and when.
Indoor and outdoor security cameras to monitor when you go outside, for how long, and why.
They acquired roomba, which not only maps out your house, but they have little cameras in them as well, another angle to monitor you through your house in more personal areas that indoor cameras might not see.
They have the Alexa products meant to record you at all times for their own use and intent.
Why do you think along with Amazon Prime subscriptions you get free cloud storage, free video streaming, free music? They are categorizing you in the most efficient and accurate way possible.
be aware, everything you say around amazon, apple, alphabet, meta, and any other corporate trash products are being sold, trained on, and sent to your local alphabet agency. it's been this way for a while, but this is a nice reminder to know when to speak and when to listen
So... if you own an inexpensive Alexa device, it just doesn't have the horsepower to process your requests on-device. Your basic $35 device is just a microphone and a wifi streamer (ok, it also handles buttons and fun LED light effects). The Alexa device SDK can run on a $5 ESP-32. That's how little it needs to work on-site.
Everything you say is getting sent to the cloud where it is NLP processed, parsed, then turned into command intents and matched against the devices and services you've installed. It does a match against the phrase 'slots' and returns results which are then turned into voice and played back on the speaker.
With the new LLM-based Alexa+ services, it's all on the cloud. Very little of the processing can happen on-device. If you want to use the service, don't be surprised the voice commands end up on the cloud. In most cases, it already was.
If you don't like it, look into Home Assistant. But last I checked, to keep everything local and not too laggy, you'll need a super beefy (expensive) local home server. Otherwise, it's shipping your audio bits out to the cloud as well. There's no free lunch.
Off-device processing has been the default from day one. The only thing changing is the removal for local processing on certain devices, likely because the new backing AI model will no longer be able to run on that hardware.
With on-device processing, they don’t need to send audio. They can just send the text, which is infinitely smaller and easier to encrypt as “telemetry”. They’ve probably got logs of conversations in every Alexa household.
Easy fix: don't buy this garbage to begin with. It's terrible for the environment, terrible for your privacy, of dubious value to begin with.
If every man is an onion, one of my deeper layers is crumudgeon. So take that into account when I say fuck all portable speakers. I'm so tired of hearing everyone's shitty noise. Just fucking everywhere. It takes one person feeling entitled to blast the shittiest music available to ruin everyone in a 500yd radius's day. If this is you, I hope you stub your toe on every coffee table, hit your head on every door jam, miss every bus.
The part that really gets me is that you have to opt out to not have everything you say saved. Bonkers that that isn't the default! There's no good user-based reason for this. Alexa doesn't remember shit for users, like any AI there's no recall feature. You can't say remember what I told you last night - give the address for that place, I was drunk and don't remember the name.
Today: "...they will be deleted after Alexa processes your requests."
Some point in the not-so-distant future: "We are reaching out to let you know that your voice recordings will no longer be deleted. As we continue to expand Alexa's capabilities, we have decided to no longer support this feature."
And finally "We are reaching out to let you know Alexa key phrase based activation will no longer be supported. For better personalization, Alexa will always process audio in background. Don't worry, your audio is safe with us, we highly care about your privacy."
Me while cooking mac and cheese for the kids:
"Echo, set timer for 8 minutes"
Echo: "GOOD EVENING [me], SETTING TIMER FOR 8 MINUTES"
No, shut the fuck up and just set the goddamn timer without the extra fluff. I've seen Ex Machina, I know you have no empathy, so knock off the "nice" shit and do what I fucking ask without anything else.
be aware, everything you say around amazon, apple, alphabet, meta, and any other corporate trash products are being sold, trained on, and sent to your local alphabet agency. it's been this way for a while, but this is a nice reminder to know when to speak and when to listen