I am not an audi, however am familiar from grad school with test construction (primarily psychological) yet many of the concepts still apply. Is the test Valid? Reliable?
Validity is a determination of whether the test measures what it purports to measure. Part of reliability has to do with whether repeated measurements when the expectation is that results should remain constant is whether they actual do remain the same. (Usually referred to as test-retest reliability and often measured by correlation co-efficient.)
Op is wondering whether the test is actually valid as it might be conflating actual hearing with a strategy of anticipating when sound should be present and “guessing” even if not actually heard. The description of standard test procedures (https://www.asha.org/policy/gl2005-00014/#sec1.4) among other things indicates that the tones should be presented for varying lengths of time. While this would not eliminate a strategy of guessing based on learning what the parameters are for the test procedure it should tend to make the results more valid by making accurate guessing more difficult.
While I expect that there is some improvement in test scores due to familiarization with the test procedure which also can lead to relaxing more about it and doing better due to a relaxed state; overall I expect the results are fairly valid and fairly consistent. While the test I expect is not given if there is excessive ear wax (or ear infection), I believe/hypothesize that amount and location of ear wax has more impact on test results than guessing; and perhaps also outweigh the impact of excessive nervousness which might be the case for those who have never had HAs and are being evaluated for whether they are to be recommended. I believe that test validity and reliability is probably higher for experienced HA people.
If I were to design test procedures, I would probably do the following [and then compare these procedures with the current protocol to determine whether there is a cost-benefit for changing current procedures}:
Present the tones (not) in the typical ascending order of frequency starting with 1K, but do so randomly. Present the intensity of the tone not in an ascending sequence of intensity starting below the intensity that is expected to NOT be heard and increasing intensity in 5DB increments, but vary the intensity is a somewhat more varied manner.
I would also test so that the presentation is randomly presented to target (left ear, or right, ear, or both ears).
Test procedure should ask for pressing button immediately when tone is heard, AND to release it immediately when tone is turned off.
Scoring of subject responses should look for accurate answers at the presumed threshold AND subtract “points” for incorrect (or delayed responses) for both the ON situation and for the OFF situation.
I’d note that the current test procedures have face validity for measuring/answering the ?S: What tones can be heard at all? and at what intensity do they need to be in order to be heard?
My suggested procedure is related but possibly more a test of and answer to the ?:
When a frequency-intensity pair is present in the sound environment is the subject consistent in actively responding (processing it as “relevant” information) to it? This is a more nuanced question and might(???) have greater relevance to a subjects real world ability process speech.
Both of these suggestions I believe would arrive at a more accurate threshold (validity) than current procedures and are a little bit more like the real world in which one can not always anticipate from where a sound will be coming, when it will come, and whether the voice will be a male, female, or child.
However, the suggested modifications would take significantly longer to administer and fitters or audis are sensitive to how much time they spend with someone. My hunch is that a test procedure that might be more valid and/or more reliable while it might have incremental improvement might not be worth the additional time.
From std test procedures:
" Threshold determination . The method described, an ascending technique beginning with an inaudible signal, is recommended as a standard procedure for manual pure-tone threshold audiometry.
Tone duration . Pure-tone stimuli of 1 to 2 seconds’ duration.
Interval between tones . The interval between successive tone presentations shall be varied but not shorter than the test tone."
**My bottom lines are:
A) I don’t expect procedures to change; I think they are probably good enough. Client who is motivated and understands the importance of the audiogram to the fitting procedure is not likely to want to game the system (in most situations) in any significant way.
B) My bigger concern/issue is that of ear wax and I have thought that since I am soon to go for new testing and new HAs that I will make sure my ears are clean, but that they have an “average for me” amount of ear wax. My hunch is that most of us might tend (if we do any kind of routine cleaning) that we do it too close to the timing of a pure tone hearing test and that the results might thus be slightly skewed to measuring toward our best possible hearing rather than what is typical for us.** Related is the fact that most aids have a wax guard and that while we might clean/brush it daily that it at least gets partially clogged during the interval of a month. (I actually look at wax guard with smart phone magnifying glass when replacing it to see if and how much it might have clogged. Basically if our technology for hearing changes/degrades over a period of weeks - assuming that the placement of the receiver is identical from day-to-day - then there are a lot of moving parts in the equation of well we hear (sensation to ear drum) over a period of time. Quite frankly I am grateful and in awe that often it works quite well and consistently!
Note also that as @molarface points out in comment above this, how much vigilance/attention we are giving our awareness to hearing (as opposed to any other thoughts or sensations) has impact not so much on whether the ear drum and nervous system receives the sensation, but on whether we process it. I expect that for many the test procedure is like waking up in the middle of the night and hearing a strange sound and then being still and intently listening to determine what it is that was heard, where the sound was coming from, and what the meaning is (an animal, a human intruder, a window that has blown open). Except this middle of the night experience; or truly focusing on just enjoying listening to music; that most of the time even when we say we are listening to someone it is only a portion of our awareness. Pure tone testing is interesting/valuable but is only partially correlated to real world situations.
It has been fun for me to speculate about the ? raised by OP; thanks for indulging my long comment.