Presumably the performance impact of the inter-CCX communication(And lack of cache coherency) introduced by a 2+2 layout is greater than the impact of halving the L3 cache. I guess having twice the amount of L3 isn't that useful when half the cores have a far higher L3 retrieval latency than the other half, must be an issue in applications that share data access across more than 2 cores. In theory you could make both 8MB chunks of the L3 fully coherent but then you also effectively half the cache size anyway.
I'm pretty sure the 2+2 layout of previous years was purely for yield reasons than performance, with the higher end 2+2 models using the full 16MB more as damage limitation.