Wednesday, 12 July 2017

Hazard (computer architecture)

https://en.wikipedia.org/wiki/Hazard_(computer_architecture)


Background

Instructions in a pipelined processor are performed in several stages, so that at any given time several instructions are being processed in the various stages of the pipeline, such as fetch and execute. There are many different instruction pipeline microarchitectures, and instructions may be executed out-of-order. A hazard occurs when two or more of these simultaneous (possibly out of order) instructions conflict.

Types

Data hazards

Data hazards occur when instructions that exhibit data dependence modify data in different stages of a pipeline. Ignoring potential data hazards can result in race conditions (also termed race hazards). There are three situations in which a data hazard can occur:
  1. read after write (RAW), a true dependency
  2. write after read (WAR), an anti-dependency
  3. write after write (WAW), an output dependency
Consider two instructions i1 and i2, with i1 occurring before i2 in program order.

Read after write (RAW)

(i2 tries to read a source before i1 writes to it) A read after write (RAW) data hazard refers to a situation where an instruction refers to a result that has not yet been calculated or retrieved. This can occur because even though an instruction is executed after a prior instruction, the prior instruction has been processed only partly through the pipeline.
Example
For example:
i1. R2 <- R1 + R3
i2. R4 <- R2 + R3

The first instruction is calculating a value to be saved in register R2, and the second is going to use this value to compute a result for register R4. However, in a pipeline, when operands are fetched for the 2nd operation, the results from the first will not yet have been saved, and hence a data dependency occurs.
A data dependency occurs with instruction i2, as it is dependent on the completion of instruction i1.

Write after read (WAR)

(i2 tries to write a destination before it is read by i1) A write after read (WAR) data hazard represents a problem with concurrent execution.
Example
For example:
i1. R4 <- R1 + R5
i2. R5 <- R1 + R2

In any situation with a chance that i2 may finish before i1 (i.e., with concurrent execution), it must be ensured that the result of register R5 is not stored before i1 has had a chance to fetch the operands.

Write after write (WAW)

(i2 tries to write an operand before it is written by i1) A write after write (WAW) data hazard may occur in a concurrent execution environment.
Example
For example:
i1. R2 'R2 <- R4 + R7 i2. R2 <- R1 + R3
The write back (WB) of i2 must be delayed until i1 finishes executing.

Structural hazards

A structural hazard occurs when a part of the processor's hardware is needed by two or more instructions at the same time. A canonical example is a single memory unit that is accessed both in the fetch stage where an instruction is retrieved from memory, and the memory stage where data is written and/or read from memory.[3] They can often be resolved by separating the component into orthogonal units (such as separate caches) or bubbling the pipeline.

Control hazards (branch hazards)

Branching hazards (also termed control hazards) occur with branches. On many instruction pipeline microarchitectures, the processor will not know the outcome of the branch when it needs to insert a new instruction into the pipeline (normally the fetch stage).

What is idempotent?

Idempotent
Math def:  ƒ(ƒ(x)) ≡ ƒ(x)
中文: 幂等

例如 max (x,x)= max(max(x,x),x)
 

How to count the exectution time inside of a function?

Use __builtin_readcyclecounter
like :

int foo(void) {
  int start = __builtin_readcyclecounter();
  CODE to be TESTED
 int end = __builtin_readcyclecounter();
 int diff = end - start
 printf(diff);
}

Thursday, 6 July 2017

VSX Scalar Convert Signed Integer Doubleword to floating-point format and round to Single-Precision XX2-form (XSCVSXDSP/xscvsxdsp)


xscvsxdsp XT,XB
reset_xflags()
src ← ConvertSDtoDP(VSR[32×BX+B].dword[0])
result ← RoundToSP(RN,src)
VSR[32×TX+T].dword[0] ← ConvertSPtoSP64(result)
VSR[32×TX+T].dword[1] ← 0xUUUU_UUUU_UUUU_UUUU
if(xx_flag) then SetFX(XX)
FPRF ← ClassSP(result)
FR ← inc_flag
FI ← xx_flag
Let XT be the value 32×TX + T.
Let XB be the value 32×BX + B.
Let src be the two’s-complement integer value in
doubleword element 0 of VSR[XB].
src is converted to floating-point format, and rounded
to single-precision using the rounding mode specified
by RN.
The result is placed into doubleword element 0 of
VSR[XT] in double-precision format.
The contents of doubleword element 1 of VSR[XT] are
undefined.
FPRF is set to the class and sign of the result as
represented in single-precision format. FR is set to
indicate if the result was incremented when rounded.
FI is set to indicate the result is inexact.
Special Registers Altered
FPRF FR FI FX XX

Load VSX Scalar as Integer Word Algebrai Indexed X-form (LXSIWAX/lxiwax)


lxsiwax XT,RA,RB


if MSR.VSX=0 then VSX_Unavailable()
EA ← ( (RA=0) ? 0 : GPR[RA] ) + GPR[RB]
VSR[32×TX+T].dword[0] ← EXTS64(MEM(EA,4))
VSR[32×TX+T].dword[1] ← 0xUUUU_UUUU_UUUU_UUUU

Let XT be the value 32×TX + T.
Let EA be the sum of the contents of GPR[RA], or 0 if RA
is equal to 0, and the contents of GPR[RB].
When Big-Endian byte ordering is employed, the
contents of the word in storage at address EA are
placed into load_data in such an order that;
– the contents of the byte in storage at address EA
are placed into byte 0 of load_data,
– the contents of the byte in storage at address EA+1
are placed into byte 1 of load_data,
– the contents of the byte in storage at address EA+2
are placed into byte 2 of load_data, and
– the contents of the byte in storage at address EA+3
are placed into byte 3 of load_data.
When Little-Endian byte ordering is employed, the
contents of the word in storage at address EA are
placed into load_data in such an order that;
– the contents of the byte in storage at address EA
are placed into byte 3 of load_data,
– the contents of the byte in storage at address EA+1
are placed into byte 2 of load_data,
– the contents of the byte in storage at address EA+2
are placed into byte 1 of load_data, and
– the contents of the byte in storage at address EA+3
are placed into byte 0 of load_data.
load_data is sign-extended to a doubleword and
placed in doubleword element 0 of VSR[XT].
The contents of doubleword element 1 of VSR[XT] are
undefined.
Special Registers Altered
None