I'm more intrigued but CPU speeds

Leemundo · Jan 26, 2019

Afternoon all,

Just a simple question that I've never quite understood....

How can a cpu with a lower single core speed out perform a cpu with higher speeds?

Noticed more between AMD and Intel CPUs. With older generation CPUs, AMD often had higher frequencies but Intel managed to pip them when benchmarked.

Probably a silly question with allot more involved than just the speed but still it's something I've always pondered.

If any one can enlighten me It would be much appreciated.

Regards,

Lee.
(Feeling like a noob and asking noobie questions)

WYP · Jan 26, 2019

It's not a silly question, we all have to learn things somewhere.

I think the easiest way to explain it is that it also matters how much work can be done per clock cycle.

It's kinda like moving a pile of soil with a wheelbarrow, a bigger wheelbarrow can be used to move more dirt in a single move. In the same way, a processor/CPU core with more execution units etc can finish more calculations within a single clock cycle.

Basically, the two ways of boosting single-threaded CPU performance is designing a more complex core or upping the CPU's clock speeds. Intel released a post a few years back to explain why CPUs are not 10GHz yet. Link here.

grec · Jan 26, 2019

It's also worth noting that designing a CPU for higher clocks can mean sacrificing a little on instructions-per-clock, because the maximum theoretical clock speed of an architecture is limited by the time taken to complete the longest combinational logic path(The inverse of this so-called critical path is the max clock). Often a units design will start heavily combinationally (IE without memory/registers within it, so the logic gates are connected directly and the signal passes through "instantaneously"), however there is a small delay when the signal passes through each logic gate (Arrangements of CMOS N or P type transistors to form AND/OR/NOR/ect gates, the building blocks of more complex logic blocks), the sum of this delay is the time taken for a path to complete, and larger & more complex blocks take more time to complete(On the nanoseconds scale). However, these blocks can then be broken up by putting registers(memory) within them, which store a signal that can then be read out on the following clock cycle, this help breaks up a long path into a collection of shorter paths at the cost of instruction latency, as it now takes more cycles to complete. On modern desktop CPUs these logic paths are often 10-20 stages deep, IE an instruction takes several cycles at a minimum to complete, but as long as there's no branch in the code(Or the branch predictors guessed what the outcome of the branch would be correctly) a unit can still theoretically pump out one instruction per cycle once the pipeline in full. However, if a branch prediction gets things wrong this can mean all the data in the pipeline after that branch is based on a wrong assumption and so the unit has to be flushed and restarted with an empty pipeline, which can make long-pipeline designs (Prescott, Bulldozer) particularly inefficient with branchy code that results in lots of misses, especially with the combined impact on cache use.

A big part of modern CPU performance gains & research just comes down to the branch predictor, the higher %age you can get those guesses right the more time you have a full pipeline and a cache full of useful data, and with the speed difference between how quickly a processor can do calculations now & how quickly system memory can feed them still having a reasonable gap between them, often how quick a processor is in practice just comes down to have well fed you can keep it with relevant data, so even with an insanely high IPC & clock speed design you could end up with something practically useless if every branch miss sets it back to 0(Kinda Bulldozers downfall).

Leemundo · Jan 27, 2019

Thanks guys,

Some of that went way over my head but I think I get the jist of it.

From what I can make out, though slower if you can transfer the data in bulk there are less likely to be errors. Smaller amounts of data can be transferred quicker but at the cost of error issues and I'm guessing heat at some point also?

Thanks again.

Lee.

grec · Jan 27, 2019

Yeah, basically simpler logic blocks, or complex logic blocks broken up into simpler sections, allow faster clock speeds, while slower target clock speeds allows more time between each clock pulse for the signals to travel through the combinational logic blocks where the calculations occur, which allows you to design longer & more complex logic blocks with higher max instructions per clock, even if it means the architecture is hard-limited to a certain low clock speed performance benefits can be found as more work may be done per clock cycle.
But then complex blocks broken up into too many bits will also have a larger penalty if the program branches, (IE, a choice is made that changes the path of execution, resulting a Jump/GOTO/Branch statement [Depending on assembly language, though represented by IF statements and the like in more abstract programming languages like C], as this means all the data pre-emptively calculated & stuffed down the pipeline was for a useless line of execution, so it's a careful balance in that aspect too.

This isn't delving into many other challenges of complex pipelines like control hazards and such but should give an idea of how it's about finding the right balance on the apex of a collection of performance curves (Curves that can change wildly from program to program) rather than about pushing any of said curves to their maximum.

In many ways, this is one big part Bulldozer got wrong, it had an extremely high miss-penalty whenever a program branched & it got the prediction wrong (Which was a lot) which killed its efficiency as most of its work would be wasted, even though when it was well fed with workloads it was well designed for(Integer heavy, minimal branching, highly parallel) it was a bit of a monster.

AlienALX · Jan 27, 2019

Leemundo said:
Thanks guys,

Some of that went way over my head but I think I get the jist of it.

From what I can make out, though slower if you can transfer the data in bulk there are less likely to be errors. Smaller amounts of data can be transferred quicker but at the cost of error issues and I'm guessing heat at some point also?

Thanks again.

Lee.

First thing to learn is IPC. Instructions per clock. IE - how much a CPU can crunch at its given speed, but more importantly at its given speed compared to other CPUs.

So let's say you have an AMD 8350 clocked to 5ghz, and a Intel 8086k clocked at 5ghz. Why does the Intel CPU splatter it? because IPC. Yes the frequency may be the same, but the Intel is able to do far more at that speed because of the increased IPC.

So even if you reduced the frequency to say, 3ghz, you would still see the same offset in performance because the Intel is doing so much more at any given speed.

It's why even if you clock a C2Q to 4ghz it can't compete with my Broadwell E CPU running at 2.3ghz, even when only four cores are being used.

I hope that simplifies it a little for you, as I know what it is like trying to learn when everything sounds so god damned complicated

grec · Jan 27, 2019

It's worth noting IPC isn't a characteristic intrinsic to an architecture, it's a figure that in varies wildly from workload to workload with different architectures favouring different loads & underlying instructions & is impacted by an endless list of external factors, as well as this it also varies depending on clock speed as architectures often become less efficient at higher clocks(Which is compounded by the fact transistor switching also becomes less efficient due to the higher voltages required).These non-linear performance gains of clock speeds are because many architectures are still inherently memory bound, Bulldozer is vastly more efficient below 3Ghz(Bulldozer based APUs still account for a good portion of AMDs OEM sales) party because many of the penalties are constant in time terms rather than cycle terms, so increasing the length of these cycles by reducing clock speed means fewer wasted cycles & less "bubbles" in the pipeline caused by cache-misses.

AlienALX · Jan 27, 2019

tgrech said:
It's worth noting IPC isn't a characteristic intrinsic to an architecture

It's that sort of speak when teaching people you need to avoid. Just try and put it in laymen's terms, because just because you understand it it doesn't mean they will, especially if you complicate it.

People need a basic grasp on things before you start reeling off tons of text, dude. It always annoys me when people do tutorial videos on Youtube and it gets dragged on and on because the person "teaching you" likes the sound of their own voice. Plus the last thing needed when teaching people stuff is it being made too logical and thus very drawn out and boring.

Like some one who constantly digresses because he likes the word. Or rather, the sound of his own voice.

None of us walked into technology and learned everything in a day, so you can't explain things with X years of knowledge by making it complicated.

grec · Jan 27, 2019

That really depends on how someones likes to learn, personally I think the nuances are what make something interesting, and I always enjoyed lecturers who would tangent to give wider perspective, but I don't think I got too heavy on the details, most of it is easy to understand if you think of it analogous to other real world phenomenons, like how some cars are suited to some tracks that others, and "speed" in the real world isn't an absolute, I don't think it's a particularly complex to outline there are nuances and some of the basic factors that create said nuances. But you did just write about as much text complaining about the length of my explanation as that particular explanation itself, I don't expect the guy to know what every single term is in and out, but a loose idea, and knowing what you don't know is the first step to knowing where to start learning if you're particularly interested in a topic.

AlienALX · Jan 27, 2019

tgrech said:
That really depends on how someones likes to learn, personally I think the nuances are what make something interesting, and I always enjoyed lecturers who would tangent to give wider perspective, but I don't think I got too heavy on the details, most of it is easy to understand if you think of it analogous to other real world phenomenons, like how some cars are suited to some tracks that others, and "speed" in the real world isn't an absolute, I don't think it's a particularly complex to outline there are nuances and some of the basic factors that create said nuances. But you did just write about as much text complaining about the length of my explanation as that particular explanation itself.

It's the way you are wording it, dude.

I have an IQ of 152 on my best days (because it's all mood dependent) however, I find learning things quite hard.

Remember - "A question is only hard if you do not know the answer". What you are trying to do is explain things in the same way you would explain them to yourself. What if the person you are trying to teach does not have the same IQ as you? or, has a higher IQ but has no idea what you are talking about?

You're wasting your breath.

I used to be heavily into car audio. Like, all of the really geeky stuff like thielle small parameters, XMAX and cabinet designing etc. I could sit here and waffle on all day about it, but it doesn't mean any one would bother reading it.

No offence, but I don't bother reading most of your posts. Like you say in your sig, you like rambling, but that doesn't mean it is interesting for others to read.

I taught a kid how to build and water cool a PC the other week. I would bet hardly any of it has actually gone in and stuck. I didn't have time to pour 40 years of experience into him without it taking 40 years.

Computers used to be a totally geek thing to do. Mostly because you needed a very high IQ to learn and understand it all. These days? more people are interested in computers and computing and thus, you are going to get people who lack the skills to take in very complex instructions. Plus with me having a high IQ from childhood I had to be careful not to over geek things or people would just stand there with no idea what I was rambling on about.

Short answers are always the best.

grec · Jan 27, 2019

I don't see why IQ comes into it, I didn't get into any maths or anything, most of what I'm saying isn't really even specific to computers, you can apply this concept of work distribution to many assembly lines and such. I'm not tryna teach him the ins and outs of anything, just give pointers on general areas & topics that are relevant to the question and how they work together.

I'm not too sure what your problem is tbh, I'm not forcing either of you to read what I say, I know not everything I say will be accessible to everyone, but it's sometimes a fine line between being accurate & being accessible and I feel other people already have the latter part well covered. This isn't how I'd explain things to myself, more say a mechanical engineer or something with an interest in the topic, because at the end of the day these concepts are no more complicated than many we're familiar with in day to day life, they're just harder to observe, if you really want to learn about them then obviously just search for videos on this stuff mentioned & look at diagrams of it all.

This is the video often used to give an overview nowadays if you wanted a further understanding:
https://www.youtube.com/watch?v=cNN_tTXABUA

AlienALX · Jan 27, 2019

tgrech said:
I don't see why IQ comes into it, I didn't get into any maths or anything, most of what I'm saying isn't really even specific to computers, you can apply this concept of work distribution to many assembly lines and such.

Of course it does. Look, you seem to be a very intelligent, knowledgeable guy. However, when you ramble on about technicalities and god knows what not people simply won't read it, or will try to and just get bored.

You're taking for granted your knowledge and so on and expecting every one else to keep up. Or, be as into the fine detail as you are.

The world doesn't work like that.

grec · Jan 27, 2019

I don't like to assume people want quick & easy explanations of everything because the internet is saturated with them, I think if you're going to simplify a concept till the point it no longer holds true in many real world applications you should at least give a little clause so that someone doesn't assume it's an absolute, because that's a mistake many in the tech community do & it can lead to all sorts of other wrong conclusions as a result if someone starts to get deep into it or even just when it comes to basic buying decisions, knowing that a 20% overclock won't give you a 20%/consistent performance boost and such can be quite useful to know, especially as some synthetic benches *can* scale abnormally well with clock speed due to these factors, otherwise certain things that are quite normal can appear quite weird, and people start to assume things are wrong when in fact they're quite alright.

It's worth noting these are fundamentally GCSE/A-level topics nowadays, the modern syllabus' go far deeper than the ICT rubbish we were unloaded with, and now we have 11 year olds and such with relatively solid grasps of languages like Python, and by 18 most A-Level Computer Science students will understand the bulk of what I've said here. If anything, I think if my explanations seem crazily unintuitive I'm probably just too young for you guys & start with a different set of basic assumptions.

looz · Jan 27, 2019

As someone whose shoe size is 43 European I find your way of writing absolutely fine tgrech.

Leemundo · Jan 28, 2019

Wow.. information overload but all very interesting, what I understand of it anyway.

Sorry, I didn't mean to cause any arguments so to speak but thank you for taking the time to explain things. I probably have the iQ of 5 but I'm slowly getting the grips of some of this techy talk. &#55357;&#56834;

I just enjoy building computers, using them and playing around testing things. The basics are helpful even in just understanding the reason for why your doing something and how it works.

Thanks again. Big &#55357;&#56397;

AlienALX · Jan 28, 2019

Leemundo said:
Wow.. information overload but all very interesting, what I understand of it anyway.

Sorry, I didn't mean to cause any arguments so to speak but thank you for taking the time to explain things. I probably have the iQ of 5 but I'm slowly getting the grips of some of this techy talk. ��

I just enjoy building computers, using them and playing around testing things. The basics are helpful even in just understanding the reason for why your doing something and how it works.

Thanks again. Big ��

That was what I feared. It's like me asking how to connect an 8 pin and getting this in response.

" Maybe use iconic subinterlink with the ionizing master systems and the promethean phaser type-8 with psionic replicators instead?"

All well and good if you understand it, gobbledygook if you don't..

I'm more intrigued but CPU speeds

Leemundo

New member

WYP

News Guru

grec

New member

Leemundo

New member

grec

New member

AlienALX

Well-known member

grec

New member

AlienALX

Well-known member

grec

New member

AlienALX

Well-known member

grec

New member

AlienALX

Well-known member

grec

New member

looz

Active member

Leemundo

New member

AlienALX

Well-known member

Similar threads