I’m going to chat a little about a freshly-published auditory training study. I found it in the wild, with people discussing the paper as if it supported exactly the opposite conclusion that should be drawn from the data. That was troubling. So I took a look and I want to share what I think. The paper is:
Anderson, S., DeVries, L., Smith, E., Goupell, M. J., & Gordon-Salant, S. (2022). Rate discrimination training may partially restore temporal processing abilities from age-related deficits. J. Assoc. Res. Otolaryngol. https://doi.org/10.1007/s10162-022-008
What is the study about?
The study targets the listening abilities of three groups of listeners to see if their temporal processing abilities would improve with training. This is a sensible target, as there are good reasons to think that temporal processing declines with age, and there’s decent reason to think that improved temporal processing abilities could lead to improved speech recognition.
When a person is trained on one task, but experiences an improvement on a untrained task, that’s called generalization. Finding an auditory training protocol that leads to robust generalized improvement in speech understanding is an important goal. In the study, 3 levels of generalization were assessed.
Direct improvement: did training improve performance on the trained task?
Near generalization: did training improve performance on the same temporal processing test, but using untrained stimulus rates?
Mid generalization: did training improve performance on untrained tests of temporal processing?
Far generalization: did training improve performance on untrained tests of speech recognition?
The research team trained people on a listening task (by having them do it over and over) and then looked to see if performance on post-tests improved compared to the pre-test results.
Listeners were split into two “treatment” groups. The Experimental group trained to the rate discrimination test, and the Active Control group trained on a different auditory test that doesn’t depend on temporal processing abilities.
Participants were recruited to fill one of three groups: young adults with minimal signs of hearing loss, older adults with minimal signs of hearing loss, and older adults with hearing loss.
This set up is good because comparisons can be made on the basis of age and on the basis of age & mild to moderately-severe hearing loss.
What did they find?
On the trained task, they found that the Experimental group showed improved performance on the post testing, though this effect was most reliable for the older adults without hearing loss in the Experimental group and for the 300 Hz stimulus. Both older adult groups who did the Active Control task also improved on the 300 Hz stimulus at post test, but not to the same degree.
Direct: ✔️
Near: ❔
Mid: ❔
Far: ❔
For near generalization, there was also an effect but really only robustly in the older adults with normal hearing. The pattern of results supports that training was effective for the older adults and generalized across pulse rate frequencies. Great!
Direct: ✔️
Near: ✔️
Mid: ❔
Far: ❔
But that’s where the ride ends. The data show no evidence of mid or far generalization. Bummer. This is a smart study conducted well, and the authors were right to think that this training had potential. It might have worked! It was promising. But they tried it, and it didn’t work.
Direct: ✔️
Near: ✔️
Mid: ❌
Far: ❌
How are these findings interpreted?
With that in mind, let’s revisit the title: “Rate Discrimination Training May Partially Restore Temporal Processing Abilities from Age‑Related Deficits”
I do not agree with this title. I believe it is misleading. They did not find that temporal processing abilities in general were restored. Here is what I would have titled it: Auditory training of pulse-rate discrimination to target age-related deficits does not generalize to tests of temporal processing or speech recognition
Let’s look now at the abstract:
“Generalization was observed in significant improvement in rate discrimination of untrained frequencies (200 and 400 Hz) and in correlations between performance changes in rate discrimination and sentence recognition of reverberant speech.”
Wait a minute. What’s this about generalization to speech? And correlations?
Oh, yes, those correlations that showed up out of nowhere in Figure 7. The correlations that are not mentioned at all in the methods section, despite all other statistical procedures being detailed.
Hmm I’m getting some “post hoc” smells. Let’s look closer at the results subsection “Correlations Among Perceptual Measures“. From the paper:
“Correlations were also calculated for the improvements in measures (post-test minus pre-test change) for the 300-Hz DL and the PLF (the rate at which the greatest changes were observed across groups) and measures that were related to pre-test 300-Hz DLs in Table 2 (non-speech measures: gap detection and 100-ms tempo discrimination; speech measures: 60 % TC, 0.6-s RV, and 1.2-s RV).”
This is a mess. Table 2 indicates that the 300 Hz DL was correlated with EVERY other measure. Why was the 40% TC condition excluded exactly? If you think ceiling effects, look again at the speech recognition scores (Fig 6) for the 0.6s REV condition. Similar mean performance. So why is the 40% TC condition excluded?
“The analysis was restricted to the training group listeners …”
What?! Sorry to interrupt but let’s stop here in the sentence. As they state in their discussion
“A larger degree of improvement in temporal rate discrimination DLs occurred for the experimental group compared to the active control group, suggesting perceptual learning for the experimental group and some procedural learning for both groups”
If the analysis is restricted to just the Experimental group, then according to their own story of the data, the effects they are capturing are a combination of procedural and perceptual learning, and the data from the Control group suggests that the degree of procedural learning is large. That means that correlations in the Experimental group alone cannot be used to describe perceptual learning without the confounding effects of procedural learning.
Let’s get back to the sentence I interrupted.
“The analysis was restricted to the training group listeners who had scores less than 100 % on the pre-test measures (N = 39), in other words, the listeners that had the potential to improve.”
A person with a score of 99% fits their criteria for inclusion here and yet does not actually represent an individual who has “the potential to improve”. This is bad data grouping. To this point, when you look at the scatter plots in Fig 7, the data is swamped by ceiling/floor effects. It’s not meaningful to analyze in this way.
Speaking of Fig 7, why did the 60% TC condition disappear mysteriously?
Why is it that 3 out of 4 speech conditions (or is it 2 out of 3?) put through this analysis did not show a statistically significant effect, but the discussion and abstract have a tone of hope, like generalization was demonstrated? At best we can say that the people who paid attention to the training were likely to also pay attention to the speech testing.
These correlations have another gigantic problem. It is questionable how appropriate it is to lump participants of all ages and hearing levels together given how wildly different their baseline performance and relative change in performance at post-test.
They are distinct subject populations and they should not be grouped together to look for evidence of improvement on age-related deficits. In my opinion, these correlations should not have been included in the paper.
This is a smart study conducted well, and the authors were right to think that this training had potential.
It might have worked!
It was promising. But they tried it, and it didn’t work.
Does this matter? Yes.
The University of Maryland released some PR about the study. The university PR got reprinted by something called The Brighter Side of News, which then made its way to BoingBoing.net as “Study finds you can change your brain to hear better in chaotic noisy environments“. Yikes! No it does not!
Now the study is being misinterpreted as an example of auditory training that can help people to understand speech in noisy restaurants. In fact, the study showed nothing like that. How did this happen? I don’t know, but perhaps the pressure to put out positive findings and the tricky challenge of communicating complex research designs each played a part. The consequence is a public misunderstanding of the real, high quality science that the Maryland team performed.