Oh man. I sense trouble.
Firstly, regarding the ANC study, the only place where we really factored in the output of the headphones, was for the final 3QUEST analysis. (all the other tests were with no headphone playback, just different ANC modes and varying BGN scenarios) For the 3QUEST tests we wanted to check the speech quality reproduction in the presence of noise. And the volume matching we did on those headphones was to gauge - and set - overall Active Speech Level (acc. to ITU-T P.56) when no noise is present. I *think* we used a 16sec speech file (4x4sec segments of speech with pauses). That gives us realistic source material with fairly broadband stimulation, and gets us close enough to volume matching the headphone outputs for THIS study.
Secondly (and more importantly?), how would I volume match headphones for frequency response comparisons? Traditionally, and from a standards perspective, frequency response is defined as output/input. When you measure the headphone (or other audio device) that way, it gives you the 'transfer function' of the system and makes the resulting plot independent from your input (as long as you are away from the rails). Therefore you can compare curves directly.
Of course, that often means the two curves don't necessarily overlap or line up neatly, which is why people resort to plotting headphone response versus dB SPL and pinning the headphone responses at a certain frequency and output (500Hz/1kHz @ 94/90dB SPL whatever.). However, if volume matching is done so we can easily compare frequency responses to make a judgement call on which headphone might be better(?) - or to set levels prior to subjective listening - then your choice of frequency can be hugely important.
I think
@arnaud alluded to a conversation he had with
@jude earlier in this thread about using Loudness for this, which sounds like a good approach. Along the same lines as this discussion thread about ANC, a lot of the same psycoacoustic principles could be applied here:
(1) If you are using a broadband and time-varying source material (speech/music) to stimulate your headphones, you are arguably placing them in a realistic mode of operation (unlike pure tones).
(2) If you are analyzing the resulting output using an advanced hearing model based loudness metric, you can then adjust the volumes so they match on a scale of subjective magnitude. Again, that should get you as close to doing this by ear.
(3) When you then run the frequency response measurements, you should be able to better see the balance of each headphone as well as where they differ.
(4) When you subjectively listen to the headphones, you should be perceiving them as close to the same amplitude - remove that variable in your evaluation - and focus on other elements in the audio playback.
It's honestly not something I have spent a lot of time investigating, but it seems like it has potential.