I was deciding whether to go ATI 3850 or Nvidia 9600GT and I had a really hard time deciding mainly because of the 320 vs 64 Stream Processors. I had thought that they all were equal so I did whatever a normal guy (with a really strange personality) would: I investigated on the Internet. Now I have realized the difference is not only in shader numbers, but also in what and how they do wathever they are supposed to do.
Note/Disclaimer: the following text is "home-brewed" out of about a dozen or so articles so it is only as techy as I am. I assume no liability of the implications of my "study". Furthermore, in what I am about to write, 1 ATI Stream Processors is actually a cluster of 5 advertised by the company. The reason behind doing so is to simplify the comparison between the 9600GT and the 3850.
Nvidia StreamProcessors can do 1+1 operations (1 complex and 1 simple), whereas ATI's can do 1+5 (1 complex and 5 simple... they are actually a 1+1 processor and 4 simple processors). According to this simplification both cards (9600GT and 3850) have 64 "Stream processors". Now it comes to (stock) clock speeds Nvidia has them clocked at 1625 Mhz and ATI at 670 Mhz. This would yeald a relation of about 2.4x faster SPs from Nvidia.
Taking this into account I can conclude than Nvidia has the advantage in applications where the shading involves more advanced operations (because it has aprox. 2.4x more complex shader performance), whereas ATI has the advantage in applications where the shading involves more simple operations (roughly a 2.07x advantage to be more specific).
Which one is "better" I cannot say from a theoretical point of view (but looking at the benchmarks I can see the 9600GT has better software* support).
*software to be read as "computer game"
No matter what ATI fans will say, no matter how much driver optimization the 38x0 cards get, they will need game developers to optimize their products to use simpler operations. In any case they cannot make them to fit exactly the hardware specifications of ATI cards because it would mean very poor performance on Nvidia cards. Nvidia can do the same and the software can take advantage of the different architecture, but, as in the previous hypothetical case, not in the full extent of the posibility because the ATI performance would be very poor. Game developers want the games to run on the widest range of GPUs so the most probable situation would be to optimize to a 1+3 implementation (remembering the different clock speeds of 2 manufacturers' shaders and yes, inclining to favour ATI a little bit) that would yeald roughly the same results on the cards (here stock and overclocked clock speeds are going to play their parts).
I have taken into account a 64 SP card (9600GT)... when moving to a 8800GT the gap in performance is a lot bigger in complex and a lot smaller in simple operations as there are 48 more SP in the card to... shade (but at a slower clock frequency, which doesn't cripple them that much). The 3870 has higher clocks that make up in part for the difference but it still has only 64 SPs (as defined by my own naming convention).
Empirical evidence (aka "Game Benchmarks") are relatively consistent with my theory (I can see a 1+2 optimization, which has something to do with the "The way it's meant to be played" strategy that Nvidia has) because if they weren't we would constantly see games that run twice as fast on one card and others where it is the other way around (considering the 9600GT and the 3850 which have roughly the same core frequency).
Final note: I am aware of the awkwardness of the post title and that the movie was called "300" ... but I couldn't come up with a better title
Note/Disclaimer: the following text is "home-brewed" out of about a dozen or so articles so it is only as techy as I am. I assume no liability of the implications of my "study". Furthermore, in what I am about to write, 1 ATI Stream Processors is actually a cluster of 5 advertised by the company. The reason behind doing so is to simplify the comparison between the 9600GT and the 3850.
Nvidia StreamProcessors can do 1+1 operations (1 complex and 1 simple), whereas ATI's can do 1+5 (1 complex and 5 simple... they are actually a 1+1 processor and 4 simple processors). According to this simplification both cards (9600GT and 3850) have 64 "Stream processors". Now it comes to (stock) clock speeds Nvidia has them clocked at 1625 Mhz and ATI at 670 Mhz. This would yeald a relation of about 2.4x faster SPs from Nvidia.
Taking this into account I can conclude than Nvidia has the advantage in applications where the shading involves more advanced operations (because it has aprox. 2.4x more complex shader performance), whereas ATI has the advantage in applications where the shading involves more simple operations (roughly a 2.07x advantage to be more specific).
Which one is "better" I cannot say from a theoretical point of view (but looking at the benchmarks I can see the 9600GT has better software* support).
*software to be read as "computer game"
No matter what ATI fans will say, no matter how much driver optimization the 38x0 cards get, they will need game developers to optimize their products to use simpler operations. In any case they cannot make them to fit exactly the hardware specifications of ATI cards because it would mean very poor performance on Nvidia cards. Nvidia can do the same and the software can take advantage of the different architecture, but, as in the previous hypothetical case, not in the full extent of the posibility because the ATI performance would be very poor. Game developers want the games to run on the widest range of GPUs so the most probable situation would be to optimize to a 1+3 implementation (remembering the different clock speeds of 2 manufacturers' shaders and yes, inclining to favour ATI a little bit) that would yeald roughly the same results on the cards (here stock and overclocked clock speeds are going to play their parts).
I have taken into account a 64 SP card (9600GT)... when moving to a 8800GT the gap in performance is a lot bigger in complex and a lot smaller in simple operations as there are 48 more SP in the card to... shade (but at a slower clock frequency, which doesn't cripple them that much). The 3870 has higher clocks that make up in part for the difference but it still has only 64 SPs (as defined by my own naming convention).
Empirical evidence (aka "Game Benchmarks") are relatively consistent with my theory (I can see a 1+2 optimization, which has something to do with the "The way it's meant to be played" strategy that Nvidia has) because if they weren't we would constantly see games that run twice as fast on one card and others where it is the other way around (considering the 9600GT and the 3850 which have roughly the same core frequency).
Final note: I am aware of the awkwardness of the post title and that the movie was called "300" ... but I couldn't come up with a better title
