Sure I can explain, but it's not simple - I will try my best to explain.
The white/green filter has a WTA 16FS to 256FS filter, which replaces the analogue type digital filter (it's a third order IIR type filter) that is used with orange/red.
To perfectly reconstruct the analogue signal that was in the ADC before it was sampled you need a sinc function filter. But a sinc function filter has values that take an infinite amount of time to decay to zero; fortunately the values of the sinc function halves every time you double the time period (that is double the number of samples that the filter processes by doubling the tap length); that means for a given oversampling rate (16FS say) if you double the tap length, then the values of the sinc function halves. Eventually we get to the point where the values are so small it will no longer make any difference to SQ. Hence with the M scaler, I have sinc function values that are smaller than 16 bits, which means that we can guarantee reconstruction at 16FS to better than 16 bits, as it is accurate to sinc to better than 16 bits, and it's this aspect that gives the M scaler it's transformational sound quality improvement.
But there is another aspect about a sinc function; it is infinitely oversampled, so you need finer and finer time resolution. In this case, as you double the oversampling rate using a sinc function filter, you reduce the area of the error by four; so a 16FS filter will reduce the area of the peak transient error by 256 times. With 16FS we have an output every 1.6 uS; and with 256FS the output is every 88 nS. Now when I designed the WTA 2 filter, which takes us from 16FS to 256FS I did not expect any real change in SQ, as the ear/brain resolves 4uS of timing differences. But when I heard the filter, I was surprised at how much a difference it made to the perception of starting and stopping of notes; you can perceive transients much more easily - so I thought it would be cool for people to hear the effect of the filter, which is why I put the option in.
But getting back to your question - technically the white filter (256FS WTA filter engaged) is better able at reconstructing transients more accurately, as running at 256FS means the residual peak error is reduced by another 256 times to 65,536 times; when we increase the accuracy of transients, then it becomes easier to perceive transients; when it's easy for the brain to perceive transients things sound faster, brighter and sharper. So if you are preferring the sound of orange over white than it means that you are doing the equivalent of soft focus for images; this suggests that your system is fundamentally too bright and edgy - either your transducers or the amp driving it is too bright. You may get better results by sticking with white but to use DSP EQ from the source.... but the downside to using source DSP will be a loss in transparency, as it's no longer bit perfect data, and EQ at 44.1 kHz will require re-dithering back to 24 bits, and that degrades transparency. But in the long term ideally you need to get the bright component that's in your system replaced. If you are using loudspeakers, consider repositioning them, so you optimise your system around white or green filters.