This is an example of intersample overs. The green squares represent the PCM samples. The green line represents the actual waveform once reconstructed:
You can see that despite all the samples themselves being at or below 0dBfs, the waveform goes above it.
To make it clearer here's the same thing but with the Y axis set to percentage:
The problem is, if you have 16 bit info, you can't go above 100%/0dBfs. 1111111111111111 is the highest number a 16 bit sample can represent. So when a DAC without any digital headroom attempts to reconstruct this, those values that SHOULD keep going up, can't and you just get clipping, as seen on the Weiss DAC204:
When you convert 16 bit data to 24/32 bit, you don't add more area that the signal could go up to, higher bit depth allows you more precision and the ability to describe SMALLER signals, not bigger ones (unless you are also changing your DACs maximum output when doing so by a factor of about 48dB).
This is why 16 bit has a dynamic range of about 96dB and 24 bit has a dynamic range of about 140dB (ignoring effects of noise shaping)
Let's say for example we are swapping from 16 bit to 24 bit.
We would take the max 16 bit sample (1111111111111111) and the simplest way to convert to 24 bit (ie: no dithering etc) is just add zeroes at the end. So we now have 111111111111111100000000.
In terms of the numbers these represent, we've gone from 65,535 to 16,776,960. But we also have to keep in mind that the DACs maximum output now has a different value.
The maximum 24 bit value is 111111111111111111111111, which in binary terms is 16777215.
65,535 is 100% of 65,535. But 16,776,960 is 99.9984% of 16,777,215. We are still effectively at the max output/clipping, and the difference is just because our new values can be described more precisely. But it hasn't changed the fact that we still cannot describe a digital sample that is over what the maximum integer our binary format allows.
You cannot describe a value of 67,000 with a 16 bit format. If we want to address this, we can instead apply some digital attenuation before our oversampling. Let's do 6dB to make things easy. So that max PCM sample we had of 65535 becomes 32,768. And that previously impossible value of 67,000 will end up being 33,500. Both of which are able to be described by a 16 bit format and within the DACs maximum output of 65535.
It doesn't matter if you convert to 24 or 32 bit, because that just allows you to more precisely describe the value, it doesn't suddenly give you additional maximum amplitude from the DAC. You COULD do it that way, meaning the DAC will convert all 16 bit values to their numerical 24 bit equivalents rather than properly converting the sample values to the proportional 24 bit values. But this then means that the loudest a 16 bit track could ever get on that DAC would be 65535/16777215, which works out to 0.3% of maximum, or -48dB, so they'd be hilariously/unusably quieter than playing 24 bit tracks.