My theory on soundstage is that if a headphone delivers left and right channels accurately, the “soundstage” perception varies greatly from song to song, with most them having a relatively “small soundstage” and closer to center.
The headphones that have a very large perceived soundstage, I feel, have some sort of artificial modulation/forced channel oscillation to make it feel as if sound is coming from different places around you. Also, making quiet sounds quieter and adding a bit of reverb can also contribute to this effect.
Again, this has just always been my internal theory.
I basically agree with you.
You can't really force much about spatial localization and general imaging with just headphones. A headphone swap can have a big impact, but it's hard to control and impossible to make it right, as different people need different FR for example, and also because it's very far from enough for realistic impression of space and soundstage.
What changes is(non-exhaustive list on the top of my head):
- The amount of distortions(not just THD). Not something spatially meaningful for us(EDIT: not meaningful to predict the resulting perceived space, not that it has no ability to impact it), but as it end ups being like having extra sounds that have no reason to be here, it most likely does affect our global perception of sound.
- The amount of isolation from "outside" sounds. Depends on how our brain will interpret that, chances are that it will consider something like maybe having our head in a box that isolates us from outside, or maybe it just keeps us conscious that the sounds only come from the headphone. Either way, it's unlikely to help feel a big space even if the music signal is fine.
- The frequency response. Always important, more so at some frequencies than others when it comes to sound localization. With huge variations from headphone to headphone.
- Channel matching. It's very common to have a few dB of difference between left and right cup at some frequencies, meaning those differences are applied to the interpretation of all FR cues used for sound localization. Ironically, people have a very hard time noticing it. It's one of those things that makes me smile when people talk about stuff at -100dB, but they have never been conscious of 1 or 2dB here and there on the left cup changing every sound at every sound levels at those frequencies. Anyway, not noticing what the problem is does not mean they don't feel consequences when interpreting the signal for spatial cues.
And in a smaller way, or ways that end up mainly into what was already listed:
- Comfort(how easily you can forget about wearing headphone).
- Angled drivers, leading mostly to a FR change, but one that
might remove one HRTF component from being 'wronger'.
- Bigger drivers, leading mostly to a more stable FR when we put the headphone on our head again. And of course other consequences for distortions, resonance frequency and whatever other consequences for being bigger, that should end up impacting FR and distortion. Maybe a bigger, more uniform wave front makes it easier for the skin on the ear to feel some vibrations? IDK.
- How much they physically shake? Same thing as above. I have no idea when that's good or not, or how much it might affect our perception of space. If they do shake, I imagine there are consequences for distortions too.
- Amount of crosstalk in the cable. If it isn't huge, people won't notice. I would tend to worry more about the amplifier's level of crosstalk. Not necessarily unloaded, what we read as spec, but how much it becomes into a low impedance low sensi load. That's an amp thing, but it's really how the headphone is made that will decide how bad it could potentially get with a given amp. Anyway, I don't list it at the top because we tend to notice only rather massive amounts, which will come to be a thing only under particular circumstances, not determined just by the headphone.
And I think that might be it. Some people talk about internal reverb, given the size of the cup, those don't qualify as spatial cues for humans(the distance/delay given is way too small), and I haven't read anything(not saying it doesn't exist, just I personally haven't read such paper) suggesting it is perceived in any way or that it affects our interpretation of space at all. I suspect that the amount of isolation from sealed cups might be a bigger perceptual deal than how they echo with tiny delays, not much bigger than when sounds bounce from different parts of the outer ear. Just my guess, though.
My point I guess is that it is much easier to just consider a fairly clean headphone with a fairly smooth frequency response, and alter almost everything relevant to imaging with more control, range, and accuracy, using some DSPs, starting with EQ. The rest as you said is in the song we play and will change greatly from song to song.