User Review of Whisper Hearing Aids

(I would appreciate it if you were to bring to Dr Schum’s attention the large numbers of veterans, whose Administrations would probably balk at the subscription model, as it now stands,)

1 Like

In the article which Don Schum wrote for Canadian Audiologist and which I linked for SpudGunner above, this paragraph appears:

“This is where advanced AI techniques come into play. The Whisper signal processing approach used deep learning to analyze many sound samples to identify the patterns that uniquely constitute human speech. The acoustic patterns that comprise human speech are unique in the world of sound. Human listeners can innately identify speech in contrast to other sounds. But traditional approaches to describe these acoustic differences and then apply processing to separate the two have met with limited success. The Whisper approach applies a real-time AI processor that can execute 300 billion operations per second to use what it has learned about the unique characteristics of speech to separate speech from noise. The amount of processing resources that Whisper has dedicated to speech-in-noise processing is 200 to 300 times greater than other approaches in the hearing market.”

So, like Oticon, they have trained their system with speech sounds. But how many are “many”? 12 million. Are they trying to accomplish something different with their training and speech samples than Oticon was?

At the point when the Veterans Administration contract includes Whisper then I will know that the company and the aids are important enough to pay attention to.

2 Likes

I remember a thread here years ago where a member suggested that the Big 6 companies had many technological advances stacked up and ready or nearly-ready to implement. But they were trickling out these advances a little at a time every three years or so for business reasons: the idea being that even a marginal improvement every three years would allow the manufacturers to tout a “revolution in hearing “ and entice a certain percentage of HA users to upgrade, especially if they were newly-eligible for an insurance benefit.
If this is really the case then I could see Whisper being a real disruptor to the established system. That is, Whisper would implement any technological improvements they could achieve almost immediately to the actual product already in the hands (and ears) of its customers. This would allow Whisper to “lap” the Big 6 in introducing improvements and force the Big 6 to greatly accelerate their implementation of technological advances.
Whisper might well present a buyout target for one of the Big 6 companies. Just idle speculation here…

1 Like

If it were the Big 2 or 3, then I can see them holding back technology advancement on purpose because they don’t have too many competition to worry about. But it’s the Big 6, and that’s a lot of competition to be deliberately holding back the aces you have up your sleeves, because any of the other 5 guys can just slide right in and steal the show from under you, despite you having plenty of aces up your sleeve. So I really doubt that that’s the case, just by simple logic reasoning. I remember the last sentence that @jeffrey made in his last post here -> “So far, it seems more of a promise…” and it looks like many of us are starting to speculate these “promise potentials” at face value so far.

Nevertheless, I would like to see Whisper being the disruptor as well. It can only be good to have Whisper competing for market share from the big 6 anyway. But I think it’d be wise to make the decision to sign up for the Whisper system mainly because it can prove to help you hear better than your current hearing aids. I wouldn’t sign up for it because “even though if I don’t get much better hearing from it, that’s OK because I see a lot of potential in it being the disrupter, that the hearing better part can come later whenever they’re ready.”

I think speech detection is not that big of a challenge. Speech has very unique characteristics that can easily be differentiated. Modern HAs from the big 6 can detect what is speech and what is non-speech very easily. But speech detection is not the challenge. Speech and noise separation IS the challenge.

In the bold part I highlighted above, Dr. Schum is right that separating the noise from the speech has been very hard (almost the holy grail) to crack. That’s why beam forming to the front is so prevalent, because if you can’t separate it, the only next best thing you can do is to find its (front) direction and zoom in on it to eliminate the surrounding sounds. Even if that doesn’t separate the noise that’s already been diffused with the speech in front, it’s still much better because the surrounding noise is eliminated. But we all know the trade-off here, beam forming causes the blinder effect.

Oticon with the OPN came up with a new trick borrowed from the noise cancelling principle used in headphones. Because of much faster processing power made possible with the Velox platform, they can create a noise model of sounds on the sides and back to subtract this out from the noise-diffused speech in front in real time.

But I think AI takes a different approach to speech and noise separation. There’s no more noise blocking in beam forming, or noise cancelling (a la OPN style) per se. That’s all signal processing trick, so to speak. Well, beam forming is still always there and you can still choose to have it, I guess, but then that’s not done by AI.

The AI approach adopted by Whisper as described in their whitepaper, is to build a library of all kinds of thousands of hours worth of unique sounds that can be looked up and compared, and classification of those sound characteristics are already built-in by their importance to the listener, and the sounds can be broken down further to identify their characteristics that would reveal more info about them in terms of their sources and environments. The AI probably does 4 things to the incoming signal, identify (via lookup and compared to the thousand hours of stored sounds), classify, separate, and process, all on-the-fly. So it needs a lot of brain processing power to support this.

So I think the training approaches that the Whisper AI (The Sound Separation Engine) and the training approach that Oticon do are vastly different. That’s why Whisper never mentioned how many sound scenes they’ve captured and trained their DNN on. Instead, they only mentioned capturing "thousands of hours of unique sound sources in environments and learn to distinguish … portions… that are of high importance… from those that are less relevant. Whisper seems to train by building a library of sounds and their classification and other rules and characteristics and store it in the brain, then do on-the-fly access and lookup and comparison to identify, then classify, separate and process.

Oticon seems to build their DNN differently. They also focus on the different characteristics of speech and noise, but they designed a neural network that specializes in dealing with dynamic signals, meaning not just sounds at a single moment, but sounds as they vary over time, because they recognize that a key feature of a sound is that there is also a degree of continuity to it. Based on this requirement, Oticon came up with a DNN called a Gated Recurrent Unit (GRU) which is a variant of the well established LSTM neural network used for many other things. This allows them to have “an algorithm that not only recognizes different features that sounds have in a single moment, but also how those sound features vary over time. The ability to incorporate information over time is precisely the unique attribute that they need to analyze a dynamic signal such as sounds.” The GRU/LSTM links the same neural network over time and have the network pass information to itself over and over to recreate this degree of continuity.

Hence Oticon captured entire sound scene in a 360 degree globe of mics (not just unique sounds like Whisper did, but the whole scene of multiple sounds) that last over a certain amount of time. Then they feed this sound scene into their GRU neural network to do their “things”, then adjust for best possible output, then repeat and rinse 12 million times. After it’s all trained up, they have something that “just knows what to do according to the trained “formula” and just does it by quickly cranking through the formula”, so to speak.

I guess maybe an analogy, not sure if a good one or not, is that Whisper brings the whole kitchen to the show and does all the cooking on the fly for a predetermined meal given a set of ingredients. Oticon creates a cooking machine offline, presumably a small enough machine to just put on the kitchen table, and just feeds the same ingredients in through the machine and cranks out the same meal. Not necessarily any faster, just that the machine just fits on the table and is not the whole kitchen. Of course the quality of the 2 meals is for the users to judge.

2 Likes

Thank you, V. This is all making things more interesting. Whisper has slowly been adding audiologists. When I first called Whisper in June, they had only one in my state, the only one in all of New England. Since then they’ve added a second and are expecting to add someone in Boston within the next few months after s/he has completed the Whisper training. At a certain point they will reach a critical mass with the number of audiologists on board. That is the key to their marketing. At that point, things will take off and it will be time to break out the popcorn to see if and how the other companies respond.

1 Like

@billgem: Bill, I realize that you have a life with lots of other important things happening, but I am still intrigued by your accounts of everyday exchanges that you can understand better because of your Whisper aids.

Do you have any other tidbits or examples that you can share?

As far as I can tell, you’re saying that Oticon’s approach can do as well as Whisper without a super-fast external processor, because they’re processing their database differently? Kind of pre-processing it so the earpiece doesn’t have to do as much? That’s equivalent to saying that by coding a facial recognition database in some clever way, you could then recognize faces from pixelated images as well as someone else could do with clear images. This doesn’t make sense. A slower processor in the hearing aids can’t pick up as much sound detail from the surrounding scene, because it has to be done in real time.

First of all, ALL digital hearing aids CAN pick up the sound details from the surrounding scenes in real time just fine and the same way, through an analog mic. It’s just all combined into 1 signal, that’s all, no matter how many levels of details the sound signal has. Even after converting the analog sound signal to digital, the exact same input digital signal clarity is still presented to ANY hearing aid. Yes, it’s done in real time, and you don’t need a super fast processor to get that kind of input sound clarity in the digital form. The original digital recorder decades ago can sample sounds in 44 kHz and digitize it with a very slow (but fast enough for that era) processor and put it on CDs for people to hear with great pleasure already.

Secondly, I never said the part in bold above. You’re the one who drew that parallel, and I disagree with it because it’s completely a wrong parallel to draw. The parallel you drew is fundamentally flawed because you’re talking about converting something bad to good or great. It doesn’t happen that way. Garbage in -> garbage out.

Both Whisper and Oticon and any other HAs don’t see pixelated images. They ALL see clear images for the input. It’s what they do with the images, how they slice and dice it and recreate and present the altered image to the output that’s different. Of course, it’s not images, but it’s sounds. But I’m just sticking with your analogy of images to be consistent.

Yes, I’m saying that Oticon processes the data differently, with the training done up front, so that in the end they just have a skeleton DNN framework that has the appropriate/best fit weights and biases for their neural nodes that can give good enough results in their eyes. This data in its final form is much simpler and doesn’t require as much room to put on the hearing aids, nor does it require as much computing power to process.

Maybe another very simplified analogy I can give is a curve fitting polynomial equation to the Nth degree that you develop to best fit, let’s say, a million (X,Y coordinate) data points on an X-Y chart. Let’s say you’re presented with an X coordinate every 1/100 of a second in real time, and it takes 1 second for a guy to manually look it up on the chart to find out what the Y coordinate for that X coordinate is, then how do you manage to get the corresponding 100 Y coordinates in 1 second?

Well one way is to hire 100 guys and each guy takes 1 second to look up 1 pair of X,Y coordinate, and it’ll be done in 1 second.

Another way is to plug all 100 X coordinates into this polynomial curve fitting equation and calculate what the Y coordinate is. Let’s say if this equation is captured in a calculator that can do one calculation in 1/100 of a second, and if 100 X coordinates can be presented automatically (meaning quickly) to the calculator one by one, it’ll take the calculator 1 second to come up with the 100 corresponding Y coordinates.

So both approaches can give you the 100 Y coordinates in 1 second, but one approach requires 100 guys, the other approach requires coming up with the polynomial curve fit equation on a calculator, which takes some figuring out and training and testing done up front to get this equation right.

How else would you explain why Oticon doesn’t need the brain power of Whisper but can still implement and execute a very workable DNN that’s a fully viable option that’s working just fine out there in the market place? The question is whether the results of the More is on par with the results of the Whisper brain or not? Or significantly below? So far I’m only hearing “slightly better” from Bill in favor of the Whisper.

Slightly better in normal everyday situations, but much, much better for speech in noise. Sorry if I wasn’t clear about that.

@billgem, would you go so far as to say that, for you, the Holy Grail of great speech-in-speech-noise performance has been found? I think that’s what many of us here, certainly myself, have been searching for with only limited success.

1 Like

@Bill, I’m having trouble getting my imagination around what “speech-in-noise” scenarios you’ve trialed Whispers in. The two examples you mentioned in your first post are different from my definition of “speech in noise” - mine would be something like telling fish stories with my friends (who own a medium-sized commercial mechanic’s shop) while loud work goes on all around, in a reverb-filled, cavernous shop filled with sounds ranging from impact wrenches, to grinders, to popping tire beads, etc.). That’s where I had my first “WOW” moment with my aids, when I switched over to my Speech-in-Noise program.

Without trying too hard to force equivalencies, can you tell me how Whisper has performed for you in a similarly noisy situation?

Yes, for me it has. :+1:

I can’t invent situations, Jim. I don’t spend time in machine shops. I don’t think that even normal hearing people do well with harsh machine noise. In fact a lot of people wear headsets in those situations so their hearing won’t be damaged. I do spend days at home when yard equipment can be ear splitting and Whisper handles that much better.

The company specifically talks about helping users handle restaurant mouse better. I was recently in a full banquet hall with 80 people and I could hear normal conversation at my table great and some at other tables as well. I spent today in a hospital where acoustics are terrible. Whether it was in crowded hallways or a noisy cafeteria at lunch time, I had no problem hearing. I’ve only had these HAs for 5 weeks and am retired, so there are limits on my range of exposures.

1 Like

@billgem: That’s fine, Bill. (I’m not trying to irritate you, but I have never attended a kids’ soccer game - never had any!)

The banquet example is a noisy situation that I can connect with. I can imagine cafeteria and hospital noises, too. I rarely go to restaurants, but I drive old vehicles out of necessity. Auto garages are more familiar to me, and my HAs have to allow me to function in that milieu.

I wonder if you can chronicle some kind of A/B/C comparison of your hearing experiences going forward, Bill, especially if you encounter difficult environment scenarios, then share it with us as you go.

A. With the Whisper brain.
B. With only the Whisper ear piece, but with the brain turned off.
C With the OPN 1 that you currently own.

A description of your environment, like indoors or outdoors, if indoor, whether there’s a lot of reverb inside due to high ceiling or not, how crowded the place is, and how noisy the place is, what kind of place it is, etc. Whether it’s noisy due to lots of people talking, or noisy due to background music, or due to mechanical noise like fans or tools or cars, etc.

It looks like @d_Wooluf has created another thread to discuss non-performance issues for the Whisper on already, so to keep this thread going with performance-specific discussion, it’d be great to document and share that level of details with everyone.

And of course anyone else who is trialing the Whisper (I think @x475aws so far for now) who’s willing to share their experience here via the A/B/C format above (with C being whichever HA they currently have) would be greatly appreciated as well.

2 Likes

I should be able to start next Friday.

1 Like

@happymach: You’re signing up for a trial?

1 Like

Yes! What do I have to lose but $75 for the fitting?

2 Likes