Looks like it could be the latter, that Roger just provides the best SNR.
Because the 61% claim for Roger with Infinio is made on the following Sonova web page:
Phonak elevates sound quality with new Infinio portfolio, introducing a paradigm shift with real-time AI for speech clarity in noise in Audéo Sphere Infinio | Sonova International
Here is the text:
The Infinio Platform powered by the ERA Chip
The Infinio platform offers remarkable improvements in:
-
Exceptional Sound Quality from the first moment: Infinio achieves
- 93% user preference from the first fit compared to leading competitors2
- 45% reduction in listening effort*3
- 21% less fatigue**4
- 61% improvement in speech understanding (when combined with Roger ON) ***5,
But reference 5 is only a 2020 paper showing that the Roger Pen provides a 16% word recognition improvement. The Pen only operates in a fixed omni-directional mode whereas the Roger Select operates in adaptive directional mode and provides a 61% improvement in word recognition.
Intervention The two wireless microphones by Phonak, Roger Pen, and Roger Select used the same digital modulation protocol to transmit the signal to compatible receivers. However, the Roger Pen operated in a fixed omnidirectional mode, whereas the Roger Select operated in an adaptive directional mode.
Conclusions The results suggest that the Roger Select microphone can provide significant benefits in speech recognition in noise over the use of HA/CI alone (61%) and also significant benefits over the use of a Roger Pen (16%) in a simulated group dining experience.
Source of quotes: Thieme E-Journals - Journal of the American Academy of Audiology / Abstract (thieme-connect.de)
Theoretically, though, the Roger Select can be nowhere near as good as the Sphere DNN in filtering out noise, so if there were a way to have the output of the Roger pass through DeepSonic processing, the clarity of speech should get even better. Maybe as others have suggested just above in this thread, the DeepSonic chip only works with input from external mics and there is no way to have it process a streamed source or Roger mic input because those are entirely separate DSP channels?
Edit_Update: I read somewhere in the description of DeepSonic processing that the sound input is broken down into 64 frequencies and each frequency is a complex number, having a frequency (magnitude, probably) component and a phase. The phase is presumably something like a time delay in that frequency between ears. Frequency components that don’t have the same phase as a speaking voice might be more likely candidates for noise reduction processing.
So, that might be what’s missing from streamed or Roger input - that it’s missing spatial sound information that allows frequency information from a speaking source to be differentiated from noise coming from other directions. OTH, both streamed audio and Roger On input, IIRC, can be in stereo, but maybe such sources don’t have as much spatial or temporal information as the HA body mics can pick up from external sound sources? Purely speculation on my part as to why no further processing of Roger input might be possible by the DeepSonic chip.