AI denoising will provide a better signal to noise ratio than what is possible with current directional microphone technology (either twin mic or binaural beamformers using four mics) by up to 9-10 dB.
Research is somewhat mixed, but some sources like oticon suggest up to a 10% increase in speech understanding can occur per each dB of signal to noise ratio improvement.
Current directional technology maxes out at around 6 dB signal to noise improvement, if AI delivers up to 10 dB, you’ll get an additional 30-40% of speech understanding from the AI denoising technology.
i can’t wait to see reviews of this technology - as others have mentioned it’s going to be interesting to see how it behaves in unusual environments.
Phonak published a blog/vlog on AI and Autosense just last month.
The blurb - ‘ In this episode of The AI Hearing Clinic, hosted by Chase Smith, Senior Manager Commercial Enablement at Phonak, we explore how Phonak AutoSense OS uses AI-based machine learning to seamlessly transition between noisy environments and quiet settings for clear conversations’
So…more Machine Learning than AI. But anything with AI in it causes the world to go nuts…
(Watching their horizon-scanning promo for the art of the possible, Phonak seem confident they will solve speech in noise by 2030 and have devices that do real time translations for you. Oh and natty spatial computing glasses will be widely available, at least high end audiologists at Phonak will be using them)
Ah yes- we should probably distinguish between “indirect” and “direct” AI…
The former would be using AI to improve things like when and how AutoSense switches programs, and programming a smarter, flexible filtering of audio inputs.
“Direct” (real-time) AI would be much fancier (and computing-heavy), like sorting out multiple conversations at a cocktail party and, say, project separate, colored subtitles on the smart glasses you wear. The latter requires imho a real-world understanding of what is being said (this I still suspect is a brain-, not a proximate sensorial activity). With those applications we are also in the realm of Musk’s brain-implanted chips…
You can make the analogy with how AI improved chess and Go-playing software. In the early days this was written, step-by-step, by human programmers. These days the software is generated by AI using Darwinian trial and error, producing code that is (far) superior, but nobody knows how it works. When such software runs, you can argue whether AI is (still) at work or not.
And I thought Oticon was annoying with its pre-release hype. Just release the freaking thing. And while I’m at it, if they are smart enough to develop a noise separation engine that has hearing health professionals gasping in amazement as shown on their promotional videos, implementing LE Audio- the new generation Bluetooth audio protocol that hearing aid manufacturers association asked the Bluetooth SIG to develop way back when- should be a simple walk in the park.
Yes indeed but I can’t help but feel there’s marketing sleight of hand. Say ‘AI’ to your average Joe/Joanna and they’re thinking ChatGPT, maybe DALL-E, or one of the various ‘assistants’. Maybe even Apple Intelligence.
Still, if the next gen Phonaks can process data better/faster then ask to the good I guess.
Right- AI is being used as a new tool behind the scenes to improve the software our HAs run on, not some magic in super-chips that we get to enjoy behind our ears, doing extremely clever things on the go. The proof will be in the pudding, as the saying goes.
I have the same questions. And at least in my case, I don’t think training it on my personal experiences would help much. I can only think of two possibilities:
It learns to recognize particular individuals’ voices, and to pick those voices out of a crowd. For some, this would be useful. For others, it wouldn’t be. I would generally want my husband’s voice amplified in a noisy restaurant. But in other situations, it’s much more variable. At group social or work events in a noisy setting, I want the voices of those I’m trying to converse with amplified. That doesn’t mean I want the same people’s voices amplified if they move further away and are talking to someone else.
As @bigaltavista said, this could have applications for directional beamforming. I don’t doubt that AI can be used to more effectively eliminate voices or other noise outside the cone. Depending on how big the difference is, I might even upgrade just for that. But it’s just doing the same thing more effectively, not doing anything truly new or different. There are lots of situations where directional beamforming isn’t ideal, and if that’s all this is, people who bought into the marketing promising something revolutionary are going to be disappointed.
GPUs are not designed from the ground up for neural networks. They are more cross-purposed devices. Chips specifically designed to perform neural network functions are commonly called NPUs these days. They have much better TDPs than GPUs for neural network processing (you don’t need a kW power supply!). I would suspect that even advanced audio processing has lower computational requirements than video processing (just based on typical audio bandwidth vs. video bandwidths). So, as RobHoft said, hearing will be believing, but I don’t see a need to be highly cynical out of the starting gate. A measly Raspberry Pi 5 with a 14 TOPS NPU can ~instantaneously identify objects and poses in videos. I think in audio processing folks would be willing to accept delays of a few seconds at least. Jerking audio settings this way and that as environmental sounds fluctuate is disconcerting to most folks. https://youtu.be/HgIMJbN0DS0 (Jeff Gerling YouTube on Raspberry Pi 5 NPU AI Kit setup, power consumption, and capabilities).
Thanks, very familiar with the Pi, including the M.2 AI chip. Performance in our application is lacking. If you are indicating that hearing what someone says several seconds after their lips move is OK, wow. That would drive me nuts (a very short trip). I didn’t say highly cynical, just cynical. At 90 YO, I doubt I will see much of it.
That isn’t how it would happen. The AI would inform the controls of the DSP/amplifier to change how sound should be processed. Sound would be sub-optimal for a few seconds but you’d have something. I hope this helps.
What are you pairing with that supports LE Bluetooth? There isn’t much available currently and iPhone doesn’t have it. Might be why it isn’t working great for you.
I want it because Android will support it/ Auracast in a couple of months and will use less battery and hopefully provide more functionality. Also hopefully before I need new hearing aids there will be more support for it such as at movie theaters
Don’t confuse LE Bluetooth with LE Audio. Iphone has supported LE for many years, more than ten at least. That is the means which the MyPhonak app uses to control the phonak aids. LE audio is the new, completely different protocol for audio (which uses LE bluetooth) but is not yet supported on iphone. There are threads/topics on here which go over a thousand posts discussing LE Audio.
So many chargers. Now hope they’ll release HA models with batteries that last a bit longer than “a full day”, i.e. 15 h or so, given how the performance of those things decays over just a few years… The box with ventilation is rather nice- the one I have has a silica cartridge that needs replacement/regeneration every two weeks.
nice video and gesture control seems like a interesting concept if they managed it to
work right.
As per real time translation, I think GN had this feature long time ago (maybe was Starkey)
I remember seeing a demo it never really took off, maybe it didnt work as intended who knows?