Phonak Audéo Sphere

GPUs are not designed from the ground up for neural networks. They are more cross-purposed devices. Chips specifically designed to perform neural network functions are commonly called NPUs these days. They have much better TDPs than GPUs for neural network processing (you don’t need a kW power supply!). I would suspect that even advanced audio processing has lower computational requirements than video processing (just based on typical audio bandwidth vs. video bandwidths). So, as RobHoft said, hearing will be believing, but I don’t see a need to be highly cynical out of the starting gate. A measly Raspberry Pi 5 with a 14 TOPS NPU can ~instantaneously identify objects and poses in videos. I think in audio processing folks would be willing to accept delays of a few seconds at least. Jerking audio settings this way and that as environmental sounds fluctuate is disconcerting to most folks. https://youtu.be/HgIMJbN0DS0 (Jeff Gerling YouTube on Raspberry Pi 5 NPU AI Kit setup, power consumption, and capabilities).

2 Likes