User Review of Whisper Hearing Aids

So if they came out with a rechargeable earpiece, I wonder whether an upgrade would be possible during the 3-year period.

My former audiologist said something similar when I inquired. That reminds me of a story.

My brother-in-law is a lathroscopic surgeon. Standard practice today. But when he completed his training and looked for a job, he joined a surgical practice where he was not only the only lathroscopic surgeon in the practice, but the only one in the county.

Lathroscopy is superior to traditional surgery in every way. More precise. Less invasive. Less risk. Shorter recovery time. But doctors in that county continued to refer patients to their golfing buddies rather to the new “off-brand” procedure.

Especially given the bundled way in which hearing aids are sold and in which audiologists double as salesmen as well as health care practitioners, I am very suspect of comments like the ones that you and I received.

I’ll be first in line.

This is definitely true, @happymach, that having more memories and processing power is always better. But if that involves the obvious trade-off of portability, then the question becomes what’s more important, portability and processing power? So far the big 6 have chosen portability, and Whisper is the first to choose processing power. It does make sense that for AI-related activities, processing power becomes more critically needed now ever than before. BUT, I think that depends on how the AI is implemented.

When I read that second Whisper whitepaper, they make it obvious that the brain affords them a development platform on-the-fly, which allows them to target sound patterns of the incoming sound signal using their Sound Separation Engine (which has DNN in it) and process the incoming signal accordingly, and also do other things not possible on a small HA, like collect and save users’ data to send off to the cloud at quarterly HCP visits.

My hunch is that Whisper is doing AI processing differently than some of the big 6 who has AI in their HAs do AI processing. I’m guessing, for example, that Oticon already did all their AI processing in their development lab offline, where they could afford to put huge computer processing power to work to crank through and process their 12 million sound scenes off line.

What they arrived at is a DNN already fully trained, with weights and biases for the neural nodes already finalized and verified for best accuracy. So they only have to implement the skeleton of this DNN framework, and the hardcoded the neural nodes’ coefficients (the fully developed weights and biases for the DNN) onto the hearing aid. It’s kind of like only having to put a final “best fit equation for a multi-dimensional curve” onto the hearing aid, and this best fit curve equations with built-in coefficients will compute everything from there for them to arrive at a very close final result for any of the live input signal that passes through the hearing aids.

Whisper, on the other hand, choses not to do the kind of “pre-processing” that Oticon does, but instead they choose to do “live” processing of the incoming signal to get their result. And that’s why they need the extra processing power to crank through their DNN. There are probably advantages of doing the AI processing on-line vs doing it off-line. Maybe doing it on-line can get them effective results without having to invest and train their DNN very extensively up front. Maybe the Oticon off-line approach requires a very extensive data collection and training effort up front, but in return, it affords them a condensed and finalized DNN framework (the best curve fitting equation) that is much easier to cram into a much smaller platform on the hearing aid.

Of course this is all educated guessing on my part, I’m not privy to how both companies implement their DNNs to really know. But it is consistent with Oticon being a very big company who can afford to invest in the off-line development and extensive training of their DNN up front, and Whisper being a tiny start-up company, cannot afford extensive DNN training up front, so the on-line approach allows them to start small (but needing a bigger brain), and collect users’ data to continually train their DNN over time. Who knows, maybe over time, like you said, when the Whisper DNN is well trained enough, the brain can eventually shrink small enough to fit on the hearing aid.

1 Like

What you write makes good sense. However, even with 12 million “sound scenes,” Oticon’s DNN model is only as good as what it is trained with. One can legitimately ask where those sound scenes or samples come from. From Denmark? There may be advantages to Whisper’s method where they’re continually incorporating users’ lived real-world sounds into the model.

The sacrifice in portability that Whisper makes isn’t that severe: it’s a wireless rectangle that can fit in a pocket or backpack and can be up to 10’ away. I imagine that the big 6 weren’t / aren’t even willing to consider the somewhat backward step of relying on an external unit.

I have to say that I would rather pay $125 per month for 3 years than $4500 up front, even more so since the monthly subscription apparently includes quarterly visits.

1 Like

It’s very true that the “model” of the trained DNN is only as good as the data used to train it. And real live data collected from actual users’ sound scenes from wherever they are would be more realistic than god-knows-where the Oticon sound scenes were collected. At the end of the day, the users are the final judge of the quality of the data, wherever the data comes from. My point was not really about the quality of the training data anyway, it’s about the hypothesized approach of the DNN implementation, pros and cons of on-line vs off-line.

The portability of the brain and the lease model is purely subjective. Some people may not mind those options and others think they’re deal breakers. Whisper for sure thinks that many people wouldn’t mind, and they can always put a positive spin on it, too.

One take away thing I learned from this thread is that you don’t always have to take the brain with you. I did hear from the previous thread that ear pieces don’t need the brain all the times. But it’s drilled in more on this thread that the ear pieces are premium HAs all by themselves stand-alone. So this alleviates a lot of the portability concern if you don’t have to lug around the brain with you 80-90% of the times when you’re at home anyway. But it’s nice to have when you’re out and about, and you wouldn’t mind carrying it for those more challenging environments. The one nagging thought this now brings up, though, is “why am I paying a premium lease for the brain if I don’t get/need to use very often?”,

1 Like

@happymach, et al.: I must say that I’m starting to come around and see the advantages of Whisper!s approach. Actually, I remember posting right near the beginning of this ongoing dialogue that I could appreciate some of the potential advantages. That said, my point of greatest resistance has been to the fact that they belong to somebody else (I’ve never been into leasing), and I’m wondering, like @Volusiano, what is their wellspring of improved knowledge that allows them to roll out upgrades on such a frequent basis.

I’m certain that they’ll be able to quickly refine the form factor of the BTE component, and the additional hardware comprised by the Brain doesn’t throw me off too much. But I still feel uneasy that I don’t know what product I’ll be wearing in my ears in 3,6,9,12,…,33 months from now. Perhaps, as @billgem and @x475aws assert, the product will keep getting better and better. But I’m old enough to to have seen a fair number of perfectly good products be spoiled by engineers who made changes just for the sake of being different.

1 Like

Lease to never own is not an attractive model to me. I usually keep my aids for 6-8 years. Also, my provider lets me pay off the HA’s interest free over a year. so endlessly paying out to lease a hearing aid seems…unattractive. Better still: it’s possible that HA’s will soon be covered (t least partially) in health care plans. I wonder what will happen to the 'lease forever" model in that scenario? I.E, if the cost of high end trad aids comes down by several thousand dollars.

So far it’s been indicated that Whisper is better in speech in noise situations than one other traditional brand. All of the other touted advances seem somewhat trivial. I’m still waiting to hear how good the quality of sound is, especially for music. since nothing along these lines has been mentioned–but I may have missed it–I’m assuming that the sound quality is on par with traditional aids.

I also wonder how I’d like always having to carry around the brain clipped on to me somewhere.

So far, it seems more of a promise than a reality.

3 Likes

In one of the webinars, Don Scum says that they are working on a way to get coverage from insurance companies, and that in the model under discussion, the user would own the hearing aids at the end of the 3 year subscription.

The biggest advantage is that the 3-year-period includes hardware upgrades.

The point about the ability to be reimbursed by insurance companies is an important one.

(I would appreciate it if you were to bring to Dr Schum’s attention the large numbers of veterans, whose Administrations would probably balk at the subscription model, as it now stands,)

1 Like

In the article which Don Schum wrote for Canadian Audiologist and which I linked for SpudGunner above, this paragraph appears:

“This is where advanced AI techniques come into play. The Whisper signal processing approach used deep learning to analyze many sound samples to identify the patterns that uniquely constitute human speech. The acoustic patterns that comprise human speech are unique in the world of sound. Human listeners can innately identify speech in contrast to other sounds. But traditional approaches to describe these acoustic differences and then apply processing to separate the two have met with limited success. The Whisper approach applies a real-time AI processor that can execute 300 billion operations per second to use what it has learned about the unique characteristics of speech to separate speech from noise. The amount of processing resources that Whisper has dedicated to speech-in-noise processing is 200 to 300 times greater than other approaches in the hearing market.”

So, like Oticon, they have trained their system with speech sounds. But how many are “many”? 12 million. Are they trying to accomplish something different with their training and speech samples than Oticon was?

At the point when the Veterans Administration contract includes Whisper then I will know that the company and the aids are important enough to pay attention to.

2 Likes

I remember a thread here years ago where a member suggested that the Big 6 companies had many technological advances stacked up and ready or nearly-ready to implement. But they were trickling out these advances a little at a time every three years or so for business reasons: the idea being that even a marginal improvement every three years would allow the manufacturers to tout a “revolution in hearing “ and entice a certain percentage of HA users to upgrade, especially if they were newly-eligible for an insurance benefit.
If this is really the case then I could see Whisper being a real disruptor to the established system. That is, Whisper would implement any technological improvements they could achieve almost immediately to the actual product already in the hands (and ears) of its customers. This would allow Whisper to “lap” the Big 6 in introducing improvements and force the Big 6 to greatly accelerate their implementation of technological advances.
Whisper might well present a buyout target for one of the Big 6 companies. Just idle speculation here…

1 Like

If it were the Big 2 or 3, then I can see them holding back technology advancement on purpose because they don’t have too many competition to worry about. But it’s the Big 6, and that’s a lot of competition to be deliberately holding back the aces you have up your sleeves, because any of the other 5 guys can just slide right in and steal the show from under you, despite you having plenty of aces up your sleeve. So I really doubt that that’s the case, just by simple logic reasoning. I remember the last sentence that @jeffrey made in his last post here -> “So far, it seems more of a promise…” and it looks like many of us are starting to speculate these “promise potentials” at face value so far.

Nevertheless, I would like to see Whisper being the disruptor as well. It can only be good to have Whisper competing for market share from the big 6 anyway. But I think it’d be wise to make the decision to sign up for the Whisper system mainly because it can prove to help you hear better than your current hearing aids. I wouldn’t sign up for it because “even though if I don’t get much better hearing from it, that’s OK because I see a lot of potential in it being the disrupter, that the hearing better part can come later whenever they’re ready.”

I think speech detection is not that big of a challenge. Speech has very unique characteristics that can easily be differentiated. Modern HAs from the big 6 can detect what is speech and what is non-speech very easily. But speech detection is not the challenge. Speech and noise separation IS the challenge.

In the bold part I highlighted above, Dr. Schum is right that separating the noise from the speech has been very hard (almost the holy grail) to crack. That’s why beam forming to the front is so prevalent, because if you can’t separate it, the only next best thing you can do is to find its (front) direction and zoom in on it to eliminate the surrounding sounds. Even if that doesn’t separate the noise that’s already been diffused with the speech in front, it’s still much better because the surrounding noise is eliminated. But we all know the trade-off here, beam forming causes the blinder effect.

Oticon with the OPN came up with a new trick borrowed from the noise cancelling principle used in headphones. Because of much faster processing power made possible with the Velox platform, they can create a noise model of sounds on the sides and back to subtract this out from the noise-diffused speech in front in real time.

But I think AI takes a different approach to speech and noise separation. There’s no more noise blocking in beam forming, or noise cancelling (a la OPN style) per se. That’s all signal processing trick, so to speak. Well, beam forming is still always there and you can still choose to have it, I guess, but then that’s not done by AI.

The AI approach adopted by Whisper as described in their whitepaper, is to build a library of all kinds of thousands of hours worth of unique sounds that can be looked up and compared, and classification of those sound characteristics are already built-in by their importance to the listener, and the sounds can be broken down further to identify their characteristics that would reveal more info about them in terms of their sources and environments. The AI probably does 4 things to the incoming signal, identify (via lookup and compared to the thousand hours of stored sounds), classify, separate, and process, all on-the-fly. So it needs a lot of brain processing power to support this.

So I think the training approaches that the Whisper AI (The Sound Separation Engine) and the training approach that Oticon do are vastly different. That’s why Whisper never mentioned how many sound scenes they’ve captured and trained their DNN on. Instead, they only mentioned capturing "thousands of hours of unique sound sources in environments and learn to distinguish … portions… that are of high importance… from those that are less relevant. Whisper seems to train by building a library of sounds and their classification and other rules and characteristics and store it in the brain, then do on-the-fly access and lookup and comparison to identify, then classify, separate and process.

Oticon seems to build their DNN differently. They also focus on the different characteristics of speech and noise, but they designed a neural network that specializes in dealing with dynamic signals, meaning not just sounds at a single moment, but sounds as they vary over time, because they recognize that a key feature of a sound is that there is also a degree of continuity to it. Based on this requirement, Oticon came up with a DNN called a Gated Recurrent Unit (GRU) which is a variant of the well established LSTM neural network used for many other things. This allows them to have “an algorithm that not only recognizes different features that sounds have in a single moment, but also how those sound features vary over time. The ability to incorporate information over time is precisely the unique attribute that they need to analyze a dynamic signal such as sounds.” The GRU/LSTM links the same neural network over time and have the network pass information to itself over and over to recreate this degree of continuity.

Hence Oticon captured entire sound scene in a 360 degree globe of mics (not just unique sounds like Whisper did, but the whole scene of multiple sounds) that last over a certain amount of time. Then they feed this sound scene into their GRU neural network to do their “things”, then adjust for best possible output, then repeat and rinse 12 million times. After it’s all trained up, they have something that “just knows what to do according to the trained “formula” and just does it by quickly cranking through the formula”, so to speak.

I guess maybe an analogy, not sure if a good one or not, is that Whisper brings the whole kitchen to the show and does all the cooking on the fly for a predetermined meal given a set of ingredients. Oticon creates a cooking machine offline, presumably a small enough machine to just put on the kitchen table, and just feeds the same ingredients in through the machine and cranks out the same meal. Not necessarily any faster, just that the machine just fits on the table and is not the whole kitchen. Of course the quality of the 2 meals is for the users to judge.

2 Likes

Thank you, V. This is all making things more interesting. Whisper has slowly been adding audiologists. When I first called Whisper in June, they had only one in my state, the only one in all of New England. Since then they’ve added a second and are expecting to add someone in Boston within the next few months after s/he has completed the Whisper training. At a certain point they will reach a critical mass with the number of audiologists on board. That is the key to their marketing. At that point, things will take off and it will be time to break out the popcorn to see if and how the other companies respond.

1 Like

@billgem: Bill, I realize that you have a life with lots of other important things happening, but I am still intrigued by your accounts of everyday exchanges that you can understand better because of your Whisper aids.

Do you have any other tidbits or examples that you can share?

As far as I can tell, you’re saying that Oticon’s approach can do as well as Whisper without a super-fast external processor, because they’re processing their database differently? Kind of pre-processing it so the earpiece doesn’t have to do as much? That’s equivalent to saying that by coding a facial recognition database in some clever way, you could then recognize faces from pixelated images as well as someone else could do with clear images. This doesn’t make sense. A slower processor in the hearing aids can’t pick up as much sound detail from the surrounding scene, because it has to be done in real time.

First of all, ALL digital hearing aids CAN pick up the sound details from the surrounding scenes in real time just fine and the same way, through an analog mic. It’s just all combined into 1 signal, that’s all, no matter how many levels of details the sound signal has. Even after converting the analog sound signal to digital, the exact same input digital signal clarity is still presented to ANY hearing aid. Yes, it’s done in real time, and you don’t need a super fast processor to get that kind of input sound clarity in the digital form. The original digital recorder decades ago can sample sounds in 44 kHz and digitize it with a very slow (but fast enough for that era) processor and put it on CDs for people to hear with great pleasure already.

Secondly, I never said the part in bold above. You’re the one who drew that parallel, and I disagree with it because it’s completely a wrong parallel to draw. The parallel you drew is fundamentally flawed because you’re talking about converting something bad to good or great. It doesn’t happen that way. Garbage in -> garbage out.

Both Whisper and Oticon and any other HAs don’t see pixelated images. They ALL see clear images for the input. It’s what they do with the images, how they slice and dice it and recreate and present the altered image to the output that’s different. Of course, it’s not images, but it’s sounds. But I’m just sticking with your analogy of images to be consistent.

Yes, I’m saying that Oticon processes the data differently, with the training done up front, so that in the end they just have a skeleton DNN framework that has the appropriate/best fit weights and biases for their neural nodes that can give good enough results in their eyes. This data in its final form is much simpler and doesn’t require as much room to put on the hearing aids, nor does it require as much computing power to process.

Maybe another very simplified analogy I can give is a curve fitting polynomial equation to the Nth degree that you develop to best fit, let’s say, a million (X,Y coordinate) data points on an X-Y chart. Let’s say you’re presented with an X coordinate every 1/100 of a second in real time, and it takes 1 second for a guy to manually look it up on the chart to find out what the Y coordinate for that X coordinate is, then how do you manage to get the corresponding 100 Y coordinates in 1 second?

Well one way is to hire 100 guys and each guy takes 1 second to look up 1 pair of X,Y coordinate, and it’ll be done in 1 second.

Another way is to plug all 100 X coordinates into this polynomial curve fitting equation and calculate what the Y coordinate is. Let’s say if this equation is captured in a calculator that can do one calculation in 1/100 of a second, and if 100 X coordinates can be presented automatically (meaning quickly) to the calculator one by one, it’ll take the calculator 1 second to come up with the 100 corresponding Y coordinates.

So both approaches can give you the 100 Y coordinates in 1 second, but one approach requires 100 guys, the other approach requires coming up with the polynomial curve fit equation on a calculator, which takes some figuring out and training and testing done up front to get this equation right.

How else would you explain why Oticon doesn’t need the brain power of Whisper but can still implement and execute a very workable DNN that’s a fully viable option that’s working just fine out there in the market place? The question is whether the results of the More is on par with the results of the Whisper brain or not? Or significantly below? So far I’m only hearing “slightly better” from Bill in favor of the Whisper.