What are AMD Teasing?

What is the point of getting a GPU which is 30€ cheaper if it then costs you 40€ more a year to run. Kind of kills the bang for buck idea, doesn't it.

Its often much more than just 30 euros difference over here, you usually get a non reference 290 for the price of a crappy version of a 4gb 770. Obviously now Nvidia have gone on the offensive with the new cards. Also depends on country and power prices tbh. We'll see the difference when we compare same gen cards from both sides. Judging by the 285, AMD can do less power draw fine. At the moment the 970 is a no brainer however.


Its pretty important that AMD respond quickly this time.
 
It will be interesting to see what AMD have in store although they will have a hard Job beating team green this time.
 
I hope for a card that will be the R9 390X, which will perform about 15% to 20% better than the GTX 780Ti, use the same amount of power as the R9 290X and run about 10c to 15c cooler, with a air cooler than the R9 290X but the option of a pre-watercooled card, costing around £50 more than the air cooled variant.


What I really think it will be is a card that performs about 5% to 10% better than the GTX 780Ti, needing watercooling as standard because the amount of power required to power the card kills the chance of air cooling.

I also think there might be a slim chance for a new APU and CPU shown, the APU will be a faster clocked variant of what is already out, with a upgraded GPU and the CPU will be a 10 core AM3+ CPU clocked at around 2.8Ghz to 3.2Ghz.
 
But the bottleneck in batches/draw calls is to do with CPU load, not bus bottlenecks.

So mantle etc are about reducing cpu load proper, or doing more effective multi threading.


Bus bandwidth on pci bottlenecks, I'm less sure about.

Does anyone actually know the frame by frame pcie data traffic loads and what it consists of and why more bandwidth alleviates a current measurable bottleneck?

I'm not sure I understand it right, but is local vram used for storing textures and geometry as well as frame buffers etc, or is texture/geometry pushed to gpu from system ram per frame?
From tests I gathered it was the former, in which case in not sure why it'd be a huge issue with current bus bandwidth.
If its the latter then I'm not sure but we'd probably be bus bandwidth bound constantly which isn't the case!

Hence my confusion on that point hehe :)

All GPU data is stored on the GPU. The only thing that usually gets pushed to the GPU every frame are shader parameters (constant buffers in DX11).

For example, a vertex buffer for a mesh is generally loaded by the CPU and then pushed to the GPU never to be touched again by the CPU. To move a mesh around, a transformation matrix is sent to the GPU (very little data compared to the entire vertex buffer). The vertex shader is then responsible for transforming each vertex by the matrix.

Do you actually have some data, I'm genuinely interested in the data flow and what it's made up of on a frame to frame and second to second basis... to 'fill' gigabytes a second of bandwidth.

Cheers

Dave

I do have some data yes, I'd have to dig out the specifcs when I'm back from work but I can tell you it slowed my frame down by at least 1000x. This was done in a single threaded DX11 app and the test was reading back from the GPU. APIs tend to be a bit more awkward when it comes to mapping back to the CPU so I couldn't tell you how much the API directly affected these timings.
 
The marketing departments for NVIDIA and AMD must be tireless. They go back and forth week after week with the "next big thing". I'd go crazy.
 
All GPU data is stored on the GPU. The only thing that usually gets pushed to the GPU every frame are shader parameters (constant buffers in DX11).

For example, a vertex buffer for a mesh is generally loaded by the CPU and then pushed to the GPU never to be touched again by the CPU. To move a mesh around, a transformation matrix is sent to the GPU (very little data compared to the entire vertex buffer). The vertex shader is then responsible for transforming each vertex by the matrix.



I do have some data yes, I'd have to dig out the specifcs when I'm back from work but I can tell you it slowed my frame down by at least 1000x. This was done in a single threaded DX11 app and the test was reading back from the GPU. APIs tend to be a bit more awkward when it comes to mapping back to the CPU so I couldn't tell you how much the API directly affected these timings.

Sounds good, I'm excited to see your data!

I did some rough tests here on my old GTX275/p867 mobo/2500k.

You'll have to excuse the less than scientific approach but it was just checking the logic of the entire idea of GPU vs system RAM storage.
http://www.racedepartment.com/threads/racers-graphics-engine-graphics-generally-memory.54984/

But in the end it seemed that even huge amounts of graphic textures could be shipped across the PCI slot within a single frame of rendering, at 60fps+

Since my old GTX275 had just 1/3rd the VRAM required to store those textures, ignoring HDR cube maps, frame buffers, shadow maps etc, then the VRAM > sys RAM bus bandwidth was clearly pretty damn high to render out 3gig worth of scene textures at 60fps.


Now obviously this isn't ideal, and it's clear that at frame render time if data can't be stored locally it comes over from system ram.
This comes at a cost vs local VRAM lookup, but it clearly wasn't totally terrible having to resort to the system ram over the bus.

Indeed we only need to look back to the early AGP days to see that getting access to data required for nice textures/data from system ram by the GPU has been a priority task for many years.



There is obviously LOADS of bandwidth even on those old mobos to be shunting data around.


I'm still more of the view that the real costs to fast performance are the CPU overheads proper, rather than anything in the bus itself.


I'll re-jig that test track for something crazy like 16GB of textures and see what happens with it all in view :D

Dave
 
Sounds good, I'm excited to see your data!

I did some rough tests here on my old GTX275/p867 mobo/2500k.

You'll have to excuse the less than scientific approach but it was just checking the logic of the entire idea of GPU vs system RAM storage.
http://www.racedepartment.com/threads/racers-graphics-engine-graphics-generally-memory.54984/

But in the end it seemed that even huge amounts of graphic textures could be shipped across the PCI slot within a single frame of rendering, at 60fps+

Since my old GTX275 had just 1/3rd the VRAM required to store those textures, ignoring HDR cube maps, frame buffers, shadow maps etc, then the VRAM > sys RAM bus bandwidth was clearly pretty damn high to render out 3gig worth of scene textures at 60fps.


Now obviously this isn't ideal, and it's clear that at frame render time if data can't be stored locally it comes over from system ram.
This comes at a cost vs local VRAM lookup, but it clearly wasn't totally terrible having to resort to the system ram over the bus.

Indeed we only need to look back to the early AGP days to see that getting access to data required for nice textures/data from system ram by the GPU has been a priority task for many years.



There is obviously LOADS of bandwidth even on those old mobos to be shunting data around.


I'm still more of the view that the real costs to fast performance are the CPU overheads proper, rather than anything in the bus itself.


I'll re-jig that test track for something crazy like 16GB of textures and see what happens with it all in view :D

Dave

I've moved the discussion here to keep this thread on topic.
 
Of course there is a bottleneck, how can you know how a computer works and not know there is a bottleneck at the gpu-cpu interface??

SAN JOSE, CA - GTC -- NVIDIA today announced that it plans to integrate a high-speed interconnect, called NVIDIA® NVLink™, into its future GPUs, enabling GPUs and CPUs to share data five to 12 times faster than they can today. This will eliminate a longstanding bottleneck and help pave the way for a new generation of exascale supercomputers that are 50-100 times faster than today's most powerful systems.

NVIDIA NVLink is the world's first high-speed GPU interconnect, helping pave the way to exascale computing.
NVIDIA will add NVLink technology into its Pascal GPU architecture -- expected to be introduced in 2016 -- following this year's new NVIDIA Maxwell compute architecture. The new interconnect was co-developed with IBM, which is incorporating it in future versions of its POWER CPUs.

Today's GPUs are connected to x86-based CPUs through the PCI Express (PCIe) interface, which limits the GPU's ability to access the CPU memory system and is four- to five-times slower than typical CPU memory systems. PCIe is an even greater bottleneck between the GPU and IBM POWER CPUs, which have more bandwidth than x86 CPUs. As the NVLink interface will match the bandwidth of typical CPU memory systems, it will enable GPUs to access CPU memory at its full bandwidth.

This high-bandwidth interconnect will dramatically improve accelerated software application performance. Because of memory system differences -- GPUs have fast but small memories, and CPUs have large but slow memories -- accelerated computing applications typically move data from the network or disk storage to CPU memory, and then copy the data to GPU memory before it can be crunched by the GPU. With NVLink, the data moves between the CPU memory and GPU memory at much faster speeds, making GPU-accelerated applications run much faster.


Read the full article if you want to know more...

You forgot the last paragraph......
NVLink high-speed interconnect will enable the tightly coupled systems that present a path to highly energy-efficient and scalable exascale supercomputers, running at 1,000 petaflops (1 x 1018 floating point operations per second), or 50 to 100 times faster than today's fastest systems.

So it only really concerns those wanting supercomputers and not your everyday consumer....
 
Please post the above in the thread that SPS specifically created for this argument in order to not derail this thread further.
 
Oh well..I got excited for a few minutes earlier when I rebooted my system and MSI Afterburner informed me of the new update.....My spiddy senses tingled, but alas after installing said update and reading the release v4.0.0.4604 read me for the latest changes it ( amongst lots of of coolish things) only states:
 Added AMD Tonga graphics processors family support
 Added core voltage control for reference design AMD RADEON R9 285X series graphics cards with NCP81022 voltage regulators
 Added official overclocking limits extension support for AMD Tonga graphics processors. Please take a note that unofficial
overclocking mode is currently not supported for AMD Tonga graphics processors family

Was hoping it was going to give some more but as it says TONGA 'Family' it seems sure that new gpu's are coming.
 
Last edited:
Oh well..I got excited for a few minutes earlier when I rebooted my system and MSI Afterburner informed me of the new update.....My spiddy senses tingled, but alas after installing said update and reading the release v4.0.0.4604 read me for the latest changes it ( amongst lots of of coolish things) only states:
 Added AMD Tonga graphics processors family support
 Added core voltage control for reference design AMD RADEON R9 285X series graphics cards with NCP81022 voltage regulators
 Added official overclocking limits extension support for AMD Tonga graphics processors. Please take a note that unofficial
overclocking mode is currently not supported for AMD Tonga graphics processors family

Was hoping it was going to give some more but as it says TONGA 'Family' it seems sure that new gpu's are coming.

wait the 285x, didnt even hear anything about it until now, rumors but.. now
 
Strange about the 285X, there was news last week that AMD aren't making a 285X just a 285.

cS04CNR.png
 
Back
Top