Measuring CPU to Memory speed ratio...

redhat_doom

New member
Hi All,

As you know, due to memory latency, performing one CPU operation is faster than performing one memory operations. But how can we measure the CPU speed to Memory Speed ratio? Is there any software for doing so? Thanks.
 
U'd have to expand what ur calling a cpu operation.

If ur breaking it down to a tick speed, then ur talking clock speed, which is obviously the speed of the cpu in hz. U can issue a machine language command that takes a number of clock cycles, I believe the minimum would be 3, 4 or 5 for the minimal of instruction. A non-operation command maybe, which merely bumps the program counter to the next command. Even this takes a number of clock counts as a number of things are done during it's execution even tho it does almost nothing.

To compare to memory speed, u would have to base it on how many commands and how quick the quickest of cpus possible could issue something with it. Reading is always quicker, and in order to minimize it being hampered by a slow cpu (as memory is merely chips without something using it), it'd have to be tested with the quickest cpu available. In opposition to this, the cpu value taken previously includes both reading and writing in some fashion, and relies on the ability to read/write to it's own cache. Without this, both memory and cpu values are too dependent on each other to be worth any investigation.

In all honesty, the almost scientific figures, that would mean nothing to even people who do computer science studying, u'd need inherently computer minded people to understand it.

For any ratio or figures to be understandable to the main, if u like, u'd need to keep it as basic as refering to merely clock or access speeds, which tbh can be taken as a function of the clock speeds in the bios massaged maybe by the complete memory timing cycle, and the cpus time to complete a command. Still complicated for even the keenest of enthusiast.

How about just clock speeds ? Or is that too simplistic ?
 
Thanks. By CPU operation, I mean one Addition or Subtraction operation and by Memory operation I mean one reading or writing operation. I think clock speed is very simplistic. Actually, I want to know, on average and on a given machine, how much does it take to perform one addition/subtraction operation and similarly, how much does it take to read/write say one integer value from/to memory. Now, what do you think? Thanks again.
 
Accurately, I don't have any confidence in any program, particularly launched from an OS to calculate this. There are too many things going, or perhaps more imporantly not going on at the same moment in time, for any results to be given any validation.

I would on the otherhand look at simple programs of the likes of memtest, booted straight into from dos.

Indeed, if recognizing the memory it's given, it can give certain readings that point us in the right direction.

It would be the likes of people who have programmed similarly, such simple tasks, executable outside any OS (dos forgiven maybe), that would give perfect-enough answers.

I'd hunt down with something like google, programmers and programs who have spent their time doing this. Some may well be very old, as basic command instructions have not changed with 8086 in decades.

Add/Sub (which are really the same command in theory) are relatively long sequences. Sequences including stack pointers, program counters, bit operations and atleast 2 other operations - to just load the instruction. Executing the command requires some more. The signed integer is really neither here nor there in machine language, a computer with a value alone will only understand $0-$f, or the variation of that to a power based 16. Assigning a +/- bit makes the value what it is in terms of sign - in fairness this is more of an operation than again.

Hardcore geek machine language dos executable routines would be my first searching point. And I don't mean geek in any -ve way.
 
thanks but I got confused a lot! I don't want to go deep. One way to measure the CPU speed is to first write a loop like the following and then measure the time needed to execute it, say T1:

t=clock();

for i=1:10000000

{

}

T1= clock() - t;

Then we add a simple addition operation to this loop:

t=clock();

for i=1:10000000

{

j++;

}

T2= clock() - t;

now I think (T2-T1)/10000000 shows the average time needed to do one addition operations. But am I right? Thanks.
 
That's pretty much a reproducible, simple task, that u could write for Basic, Pascal, C, and pretty much any script/compile/interpret language. The syntax of the loop is the only thing that would need massaging. Some may need variable declaration.

Yeah it'll give u an average of how long it would take the interpreted code, but the difference between the simplest of instruction and the resulting machine language can be enormous.

If it was wholey necessary to use a higher level language to write the program, something like C gives u the ability to include assembly into ur coding. This way u can have all ur bs coding for window gadgets, requesters etc, with something like a 'run' button that u could use for the user to run ur assembly.

Assembly, 68k being my favourite after 6502 - but looks pretty much the same in all machine languages as it's a merely simple loop:

move.l $counter_address, d0

move.l #$98967f, d1; (10000000 -1)

loop: add.b #$01,d2

dbra d1, loop

sub.l $counter_address,d0

rts

This will return d0 as the time it took to complete the loop. U can do what u like with d0 after that in the higher level language.

Problem u have is that there isn't a simple one command for adding. U have longword, word and byte wize adding in 68k, similarly other machine languages have their variations, especially 32 & 64bit even psuedo 32bit on 16bit machines.

If u look at a machine language reference manual, generally a small pocket book that details all the processors instructions, it'll give u clock times that it takes to execute each command and it's l/w/b variation.

Similarly other machine languages won't have simple instructions like dbra, which is decrease and branch (goto) if the result isn't negative. Without that u would have to contaminate ur study further with subtract 1 from the 10000000 value with a seperate compare command and another branch command - something that compilers do if ur lucky and I believe 8086 still does.

What we're saying here is that u wouldn't be averaging the time to execute an addition command, it's addition and the commands that control the loop.

Theoretically, if u did:

move.l $counter_address, d3

move.l #$98967f, d1; (10000000 -1)

loop1: dbra d1, loop1

sub.l $counter_address,d3

move.l $counter_address, d0

move.l #$98967f, d1; (10000000 -1)

loop2: add.b #$01,d2

dbra d1, loop2

sub.l $counter_address,d0

sub.l d3,d0

rts

U would be taking the time to do a loop with no instruction in it away from the one with the addition instruction. This 'should' negate the loop command timing.

(ofc, prior to calling this/these routines, the original values of the d0-d3 variables will be saved)

Now, in comparison to this, u can test a read from memory address command. In theory, this tests memory reading. Again u can have byte, word and longword variations, but just changing the loop to:

move.l $counter_address, d3

move.l #$98967f, d1; (10000000 -1)

loop1: dbra d1, loop1

sub.l $counter_address,d3

move.l $counter_address, d0

move.l #$98967f, d1; (10000000 -1)

loop2: move.b $random-memory-location-in-free-memory-pool,d2

dbra d1, loop2

sub.l $counter_address,d0

sub.l d3,d0

rts

Here u will have 2 values that will define bytewize command speeds. iirc, byte operations and word operations are said to take the same time.

This ofc is complete bs if ur wishing to just use a loop scripted or compiled using ur example.

What I may suggest is to make a bad example that much more identifying to the add++ command, u could similarly do:

t=clock();

for i=1:10000000

{

}

T1= clock() - t;

t=clock();

for i=1:10000000

{

j++;

}

T2= clock() - t - T1;

Here again u've done ur best to remove the time it takes the pc to do just the loop itself without a command. Myself I would dislike it's overall accuracy cos of what the machine language looks like after a compile, but it'd be the best attempt in a higher level language.
 
Back
Top