Technology Blog: 2017

Wednesday, 12 July 2017

Hazard (computer architecture)

https://en.wikipedia.org/wiki/Hazard_(computer_architecture)

Background

Instructions in a pipelined processor are performed in several stages, so that at any given time several instructions are being processed in the various stages of the pipeline, such as fetch and execute. There are many different instruction pipeline microarchitectures, and instructions may be executed out-of-order. A hazard occurs when two or more of these simultaneous (possibly out of order) instructions conflict.

Types

Data hazards

Data hazards occur when instructions that exhibit data dependence modify data in different stages of a pipeline. Ignoring potential data hazards can result in race conditions (also termed race hazards). There are three situations in which a data hazard can occur:

read after write (RAW), a true dependency
write after read (WAR), an anti-dependency
write after write (WAW), an output dependency

Consider two instructions i1 and i2, with i1 occurring before i2 in program order.

Read after write (RAW)

(i2 tries to read a source before i1 writes to it) A read after write (RAW) data hazard refers to a situation where an instruction refers to a result that has not yet been calculated or retrieved. This can occur because even though an instruction is executed after a prior instruction, the prior instruction has been processed only partly through the pipeline.

Example

For example:
i1. R2 <- R1 + R3 i2. R4 <- R2 + R3
The first instruction is calculating a value to be saved in register R2, and the second is going to use this value to compute a result for register R4. However, in a pipeline, when operands are fetched for the 2nd operation, the results from the first will not yet have been saved, and hence a data dependency occurs.
A data dependency occurs with instruction i2, as it is dependent on the completion of instruction i1.

Write after read (WAR)

(i2 tries to write a destination before it is read by i1) A write after read (WAR) data hazard represents a problem with concurrent execution.

Example

For example:
i1. R4 <- R1 + R5 i2. R5 <- R1 + R2
In any situation with a chance that i2 may finish before i1 (i.e., with concurrent execution), it must be ensured that the result of register R5 is not stored before i1 has had a chance to fetch the operands.

Write after write (WAW)

(i2 tries to write an operand before it is written by i1) A write after write (WAW) data hazard may occur in a concurrent execution environment.

Example

For example:
i1. R2 'R2 <- R4 + R7 i2. R2 <- R1 + R3
The write back (WB) of i2 must be delayed until i1 finishes executing.

Structural hazards

A structural hazard occurs when a part of the processor's hardware is needed by two or more instructions at the same time. A canonical example is a single memory unit that is accessed both in the fetch stage where an instruction is retrieved from memory, and the memory stage where data is written and/or read from memory.^[3] They can often be resolved by separating the component into orthogonal units (such as separate caches) or bubbling the pipeline.

Control hazards (branch hazards)

Branching hazards (also termed control hazards) occur with branches. On many instruction pipeline microarchitectures, the processor will not know the outcome of the branch when it needs to insert a new instruction into the pipeline (normally the fetch stage).

What is idempotent?

Idempotent
Math def: ƒ(ƒ(x)) ≡ ƒ(x)
中文：幂等

例如 max （x,x）= max(max(x,x),x)

How to count the exectution time inside of a function?

Use __builtin_readcyclecounter
like :

int foo(void) {
int start = __builtin_readcyclecounter();
CODE to be TESTED
int end = __builtin_readcyclecounter();
int diff = end - start
printf(diff);
}

Thursday, 6 July 2017

VSX Scalar Convert Signed Integer Doubleword to floating-point format and round to Single-Precision XX2-form (XSCVSXDSP/xscvsxdsp)

xscvsxdsp XT,XB
reset_xflags()
src ← ConvertSDtoDP(VSR[32×BX+B].dword[0])
result ← RoundToSP(RN,src)
VSR[32×TX+T].dword[0] ← ConvertSPtoSP64(result)
VSR[32×TX+T].dword[1] ← 0xUUUU_UUUU_UUUU_UUUU
if(xx_flag) then SetFX(XX)
FPRF ← ClassSP(result)
FR ← inc_flag
FI ← xx_flag
Let XT be the value 32×TX + T.
Let XB be the value 32×BX + B.
Let src be the two’s-complement integer value in
doubleword element 0 of VSR[XB].
src is converted to floating-point format, and rounded
to single-precision using the rounding mode specified
by RN.
The result is placed into doubleword element 0 of
VSR[XT] in double-precision format.
The contents of doubleword element 1 of VSR[XT] are
undefined.
FPRF is set to the class and sign of the result as
represented in single-precision format. FR is set to
indicate if the result was incremented when rounded.
FI is set to indicate the result is inexact.
Special Registers Altered
FPRF FR FI FX XX

Load VSX Scalar as Integer Word Algebrai Indexed X-form (LXSIWAX/lxiwax)

lxsiwax XT,RA,RB

if MSR.VSX=0 then VSX_Unavailable()
EA ← ( (RA=0) ? 0 : GPR[RA] ) + GPR[RB]
VSR[32×TX+T].dword[0] ← EXTS64(MEM(EA,4))
VSR[32×TX+T].dword[1] ← 0xUUUU_UUUU_UUUU_UUUU

Let XT be the value 32×TX + T.
Let EA be the sum of the contents of GPR[RA], or 0 if RA
is equal to 0, and the contents of GPR[RB].
When Big-Endian byte ordering is employed, the
contents of the word in storage at address EA are
placed into load_data in such an order that;
– the contents of the byte in storage at address EA
are placed into byte 0 of load_data,
– the contents of the byte in storage at address EA+1
are placed into byte 1 of load_data,
– the contents of the byte in storage at address EA+2
are placed into byte 2 of load_data, and
– the contents of the byte in storage at address EA+3
are placed into byte 3 of load_data.
When Little-Endian byte ordering is employed, the
contents of the word in storage at address EA are
placed into load_data in such an order that;
– the contents of the byte in storage at address EA
are placed into byte 3 of load_data,
– the contents of the byte in storage at address EA+1
are placed into byte 2 of load_data,
– the contents of the byte in storage at address EA+2
are placed into byte 1 of load_data, and
– the contents of the byte in storage at address EA+3
are placed into byte 0 of load_data.
load_data is sign-extended to a doubleword and
placed in doubleword element 0 of VSR[XT].
The contents of doubleword element 1 of VSR[XT] are
undefined.
Special Registers Altered
None

Wednesday, 14 June 2017

Assembler and Dissassembler

LLVM provides two different representation after generating the hardware instructions. The .s file (assembly representation) and .o file ( the object file). They are just two different "IR". If we want to generate the .o file from .s file, we need to call the assembler (usually /usr/bin/as). If we want to do the opposite, we can use objdump -d (meaning disassemble)

The assembler and disassembler together can be used to modify the assembly and do some testing.

e.g. :

clang a.c -S -O2 -o a.s

do some modifications to the a.s

clang a.s -o a.out (here clang implicitly invoke the assembler to generate exe)

You can also do (equivalent to the above):

assemble:

/usr/bin/as a.s -o a.o

clang a.o -o a.out

To look at the assembly of .o or exe

disassemble:

objdump modified.o-d > modified.out.disassembly

objdump modified.out -d > modified.out.disassembly

Monday, 12 June 2017

How to access global variables and TOC

int a = 2;
int b = 3;

int foo(void) {
printf("a+b=%d",a + b);
return 0;
}

foo:                                    # @foo
.Lfunc_begin0:
.Lfunc_gep0:
        addis r2, r12, .TOC.-.Lfunc_gep0@ha
        addi r2, r2, .TOC.-.Lfunc_gep0@l
.Lfunc_lep0:
        .localentry     foo, .Lfunc_lep0-.Lfunc_gep0
# BB#0:                                 # %entry
        mflr r0
        std r0, 16(r1) # save the link register to 16(r1), i.e stackFrame + 16`
        stdu r1, -96(r1) # prolog
        addis r3, r2, .LC0@toc@ha
        addis r4, r2, .LC1@toc@ha
        addis r12, r2, .L.str@toc@ha
        ld r3, .LC0@toc@l(r3)
        ld r4, .LC1@toc@l(r4)
        lwz r3, 0(r3)
        lwz r4, 0(r4)
        add r3, r4, r3
        extsw r4, r3
        addi r3, r12, .L.str@toc@l
        bl printf
        nop # may insert tls depending on the where is the callee definition.
        li r3, 0
        addi r1, r1, 96 # the reverse of the 3rd instr
        ld r0, 16(r1) # load the old link reigster address
        mtlr r0 # restore the link register value
        blr
        .long   0
        .quad   0
.Lfunc_end0:
        .size   foo, .Lfunc_end0-.Lfunc_begin0

Thursday, 1 June 2017

4.3.2 Data Cache Instructions (DCBT/dcbt)

\brief: DCBT is one of the Data Cache instruction used on PowerPC.

The Data Cache instructions control various aspects of
the data cache.
TH field in the dcbt and dcbtst instructions
Described below are the TH field values for the dcbt
and dcbtst instructions. For all TH field values which
are not listed, the hint provided by the instruction is
undefined.
TH=0b00000
If TH=0b00000, the dcbt/dcbtst instruction provides a
hint that the program will probably soon access the
block containing the byte addressed by EA.
TH=0b01000 - 0b01111
The dcbt/dcbtst instructions provide hints regarding a
sequence of accesses to data elements, or indicate the
expected use thereof. Such a sequence is called a
“data stream”, and a dcbt/dcbtst instruction in which
TH is set to one of these values is said to be a “data
stream variant” of dcbt/dcbtst. In the remainder of this
section, “data stream” may be abbreviated to “stream”.
A data stream to which a program may perform Load
accesses is said to be a “load data stream”, and is
described using the data stream variants of the dcbt
instruction. A data stream to which a program may perform
Store accesses is said to be a “store data stream”,
and is described using the data stream variants of the
dcbtst instruction.
When, and how often, effective addresses for a data
stream are translated is implementation-dependent.
Each data element is associated with a unit of storage,

Wednesday, 31 May 2017

How to create a new user as root?

If you are signed in as the root user, you can create a new user at any time by typing:

adduser newuser

If you are signed in as a non-root user who has been given sudo privileges, as demonstrated in the initial server setup guide, you can add a new user by typing:

sudo adduser newuser

Tuesday, 30 May 2017

Store VSX Scalar as Integer Halfword Indexed (stxsihx/STXSIHX)

X-form
stxsihx XS,RA,RB
Let XS be the value 32×SX + S.
Let the effective address (EA) be sum of the contents of
GPR[RA], or 0 if RA is equal to 0, and the contents of
GPR[RB].
The contents of halfword element 3 of VSR[XS] are
placed into the halfword in storage addressed by EA.
Special Registers Altered:
None
31 S RA RB 909 SX
0 6 11 16 21 31
if SX=0 & MSR.VSX=0 then VSX_Unavailable()
if SX=1 & MSR.VEC=0 then Vector_Unavailable()
EA <= ((RA=0) ? 0 : GPR[RA]) + GPR[RB]
MEM(EA,1) <= VSR[32×SX+S].byte[7]
VSR Data Layout for stxsibx
src = VSR[XS]
unused .byte unused
0 56 64 127
31 S RA RB 941 SX
0 6 11 16 21 31
if SX=0 & MSR.VSX=0 then VSX_Unavailable()
if SX=1 & MSR.VEC=0 then Vector_Unavailable()
EA <= ((RA=0) ? 0 : GPR[RA]) + GPR[RB]
MEM(EA,2) <= VSR[32×SX+S].hword[3]
VSR Data Layout for stxsihx
src = VSR[XS]
unused .hword[3] unused
0 48 64 127

Similarly STXSIBX, just store byte 7 into EA address in memory.

llvm PPC Register classes [Implementation based]

98 Register classes?
class name    spill

VSSRC          32
VSFRC          64
VSRC           128
VRRC           128
F4RC           32
F8RC           64
VSHRC          128
we have 64 VSR’s (128-bit vector registers). The first 32 of them overlap with the FPR’s (floating point registers that occupy the most significant 64 bits of each).
The last 32 of them overlap with VR’s (128-bit Altivec/VMX registers).

- VSSRC(Vector-Scalar, Single-precision) is a scalar register class consisting of 64 registers that can each hold an f32 (edited)

- VSFRC (Vector Scalar Floating-points) class consisting of 64 registers that can each hold an f64

- VSRC is a register class consisting of 64 registers that can each hold a vector. This is currently restricted to `v4i32, v4f32, v2i64, v2f64` since those are the only types that we have meaningful operations on in the ISA.

And VSSRC and VSFRC are the upper 64-bits of VSR0-VSR63. I.e. they model higher 64 bit of VSR (all 64 of them)

- VRRC is a register class consisting of 32 registers that each hold an Altivec (VMX) vector type which is VSR32-VSR63

- F4RC is just the 32 floating point registers used for f32 values

So F4RC models the upper 64 bit of VSR0-VSR31? and for 32 floating point

- F8RC are the 32 floating point registers that each hold an f64

Then we have a bit of complexity when it comes to GPRC/G8RC. On the one hand, it’s pretty simple - 32/64 bit integer registers respectively.
R0-R31 are 32-bit ones. X0-X31 are the 64-bit ones. However, we have a number of instructions that treat R0/X0 as special - a zero means immediate zero, not contents of register zero.
For such instructions we use the `GPRC_NOR0` and `G8RC_NOX0`. They have a special register called `ZERO` and `ZERO8` respectively. And what we do is mark this register as reserved. Then we scan the instructions that use this register class to see if they’re fed by a register that contains a zero, we get rid of the instruction that defines that register and we allocate `ZERO/ZERO8`
because they instructions need constant 0 instead of R0

So we’ll catch stuff like this:
```li 4, 0
lfsx 2, 4, 3
```
we can freely get rid of the first instruction and put a `ZERO` instead of 4 in the `lfsx`

If you use the wrong register class, you could get werid SDAG selcetion error. (Can not map REGA to REGB, or something)
FPR(i) = upper VSR(i)
isCodeGenOnly means not suitable for disassembly and assembly etc.

Error for using wrong register class:
: error: In XXBRH: Type inference contradiction found, merging '{v4i32:v2i
64:v4f32:v2f64}' into 'v8i16'
def XXBRH : XX2_XT6_XO5_XB6<60, 7, 475, "xxbrh", vsrc,

    The COPY_TO_REGCLASS must be used, if you want to copy a value to a class that doesn't support that type.
    e.g. $A is v1i128, which is not supported by VSRC, thus we used COPY_TO_REGCLASS, we also copied the result of XXBRQ
    to VRRC because after exectuting XXBRQ the result is still v1i128 even though you COPY_TO_REGCLASS $A to it,
    and VRRC the only register class that support v1i128.
    (v1i128 (COPY_TO_REGCLASS (XXBRQ (COPY_TO_REGCLASS $A, VSRC)), VRRC))>;

Wednesday, 24 May 2017

How to disable llvm opt optimization only? Leave the clang FE and llc optimization there

`-X -disable-llvm-passes`
This is the option to disable just llvm opt

Tuesday, 23 May 2017

How to rename a group of filename on unix?

a) put the file name in to a .sh, each line just put one file name, and make sure there is
no empty line.
b) :%s#$.*$#rename \1 prefix_\1_suffix#g
or b) :%s#$.*$#mv \1 prefix_\1_suffix#g (if you get error: Bareword "some_word" not allowed while "strict subs" in use at (eval 1) line number.)
c) edit the name to what you like, e.g, here I have to change *_119793 to *, I first change them all to *_1197938888, and then search _1197938888 and delete these suffix all.
d) then change mode to the script and run it. (chmod 777 rename.sh)

$ cat ~/bin/rename.sh
mv bindings_119793 bindings
mv cmake_119793 cmake
mv CMakeCache.txt_119793 CMakeCache.txt
mv CMakeFiles_119793 CMakeFiles
mv CMakeLists.txt_119793 CMakeLists.txt
mv CODE_OWNERS.TXT_119793 CODE_OWNERS.TXT
mv configure_119793 configure
mv CPackConfig.cmake_119793 CPackConfig.cmake
mv CPackSourceConfig.cmake_119793 CPackSourceConfig.cmake
mv CREDITS.TXT_119793 CREDITS.TXT
mv docs_119793 docs
mv examples_119793 examples
mv include_119793 include
mv lib_119793 lib
mv LICENSE.TXT_119793 LICENSE.TXT
mv LLVMBuild.txt_119793 LLVMBuild.txt
mv llvm.spec.in_119793 llvm.spec.in
mv projects_119793 projects
mv README.txt_119793 README.txt
mv RELEASE_TESTERS.TXT_119793 RELEASE_TESTERS.TXT
mv resources_119793 resources
mv runtimes_119793 runtimes
mv temp_52476 temp_524768888
mv test_119793 test
mv tools_119793 tools
mv unittests_119793 unittests
mv utils_119793 utils

91. llvm architecture (partially, to be extend)

IR--> SDAG Construction --> Type Legalization --> Operation Legalization --> Lowering --> ISel (Instruction Selection) --> MIR (Machine IR)

Thursday, 18 May 2017

BE VS LE data layout (&sldwi)

Say you have a vector int a = {1,2,3,4}, vector int b = {5,6,7,8}
The lay out of a in the memory on a BE machine is pretty simply, just as it is:
Because BE means high word is save in low address ( but it is natural for array access)
Addr0x   0 1 2 3
BE:     [1,2,3,4]
On LE it is the opposite (high word is save in high address):
Addr0x   0 1 2 3
LE:     [4,3,2,1]

say you have xxsldwi(a,b,3)
On BE: it is just 0 1 2 3   4 5 6 7
                 [1,2,3,4] [5,6,7,8]
       The Result RT = {4,5,6,7}
       if you want to use vec_shuffle to implement xxsldwi, you have to pass in
       shuffle_vector((vector int)a, (vector int)b, 3,4,5,6), you can simply image BE
       is acess from Left to right.
       if you do for (int i = 0; i < 4; i++)
{
    printf("RT[%d]=%d\n\n",i, RT[i]);
}
   it is just :
   c[0]=4

   c[1]=5

   c[2]=6

   c[3]=7

But On LE, for the same array a and b,
           it is just: 3 2 1 0   7 6 5 4
                      [4,3,2,1] [8,7,6,5]

       The Result RT = [1,8,7,6] in the register
       So no matter what, when xxsldwi(a,b,3), the word[3] of A, followed by
       word[0], word[1] and word[2] will be put into the result vector register
       if you want to use vec_shuffle to implement xxsldwi, you have to pass in
       shuffle_vector((vector int)a, (vector int)b,5,6,7,0), you can simply image LE
       is acess from right to left.
       if you do for (int i = 0; i < 4; i++)
{
    printf("RT[%d]=%d\n\n",i, RT[i]);
}
   it is just :
   c[0]=6

   c[1]=7

   c[2]=8

   c[3]=1

Saturday, 6 May 2017

XXPERMDI/xxpermdi (VSX Permute Doubleword Immediate ) and its extended mnemonics XX3-form

22. Extended Mnemonic for xxpermdi

Extended Mnemonic Equivalent To
xxspltd T,A,0 <=> xxpermdi T,A,A,0b00
xxspltd T,A,1 <=> xxpermdi T,A,A,0b11
xxmrghd T,A,B <=> xxpermdi T,A,B,0b00
xxmrgld T,A,B <=> xxpermdi T,A,B,0b11
xxswapd T,A <=> xxpermdi T,A,A,0b10

Monday, 27 March 2017

How to dump scheduling graph?

78. How to dump scheduling graph?
/home/jtony/git-llvm/build/team-llvm/bin/llc /home/jtony/scrum/s11/memcpy/jtony/ppc.ll -view-misched-dags 2>&1 | /home/sfertile/bin/capture_dot
test/CodeGen/PowerPC/reduced.ll

79. How to show hidden llc options?
llc -help-hidden
like:
-view-misched-dags                                       - Pop up a window to show MISched dags after they are processed
-view-sched-dags                                         - Pop up a window to show sched dags as they are processed
-view-sunit-dags                                         - Pop up a window to show SUnit dags after they are processed

Sunday, 26 March 2017

What is llvm-dis?

The llvm-dis command is the LLVM disassembler. It takes an LLVM bitcode file and converts it into human-readable LLVM assembly language.

If the input is being read from standard input, then llvm-dis will send its output to standard output by default. Otherwise, the output will be written to a file named after the input file, with a .ll suffix added (any existing .bc suffix will first be removed). You can override the choice of output file using the -o option.

Saturday, 25 March 2017

迭代文件中的行、单词和字符

1. 迭代文件中的每一行

while 循环法

while read line;

do

echo $line;

done < file.txt

改成子shell:

cat file.txt | (while read line;do echo $line;done)

awk法：

cat file.txt| awk '{print}'

2.迭代一行中的每一个单词

for word in $line;

do

echo $word;

done

3. 迭代每一个字符

${string:start_pos:num_of_chars}：从字符串中提取一个字符；(bash文本切片）

${#word}:返回变量word的长度

for((i=0;i<${#word};i++))

do

echo ${word:i:1);

done

[URL=http://www.visitormap.org/][IMG]http://www.visitormap.org/map/m:bekbxjyftgtbwlff/s:1/c:ffffff/p:dot/y:0.png[/IMG][/URL]

Thursday, 23 March 2017

LDUX/ldux ( Load Doubleword with Update Indexed)

Load Doubleword with Update Indexed
X-form
ldux RT,RA,RB

31 RT RA RB 53 /
0 6 11 16 21 31

EA <= (RA) + (RB)
RT <= MEM(EA, 8)
RA <= EA

Let the effective address (EA) be the sum (RA)+ (RB).
The doubleword in storage addressed by EA is loaded
into RT.
EA is placed into register RA.
If RA=0 or RA=RT, the instruction form is invalid.
Special Registers Altered:
None

D-Form VS X-Form

20. D-Form VS X-Form
D-Form is the immediate/register addressing mode.
X-Form is the register/register addressing mode.
Otherwise known as unindexed and indexed loads respectivelly
So if you have a series consecutive loads, the D-Form loads have a huge advantage.
Because with X-Form loads you have to have the offset in a register so you get consecutive
offsets by either using multiple registers or having instructions in between to increment the offset.

call back function

C Callbacks

Callbacks have a wide variety of uses, for example in error signaling: a Unix program might not want to terminate immediately when it receives SIGTERM, so to make sure that its termination is handled properly, it would register the cleanup function as a callback. Callbacks may also be used to control whether a function acts or not: Xlib allows custom predicates to be specified to determine whether a program wishes to handle an event.
The following C code demonstrates the use of callbacks to display two numbers.

#include <stdio.h>
#include <stdlib.h>

/* The calling function takes a single callback as a parameter. */
void PrintTwoNumbers(int (*numberSource)(void)) {
    printf("%d and %d\n", numberSource(), numberSource());
}

/* A possible callback */
int overNineThousand(void) {
    return (rand()%1000) + 9001;
}

/* Another possible callback. */
int meaningOfLife(void) {
    return 42;
}

/* Here we call PrintTwoNumbers() with three different callbacks. */
int main(void) {
    PrintTwoNumbers(&rand);
    PrintTwoNumbers(&overNineThousand);
    PrintTwoNumbers(&meaningOfLife);
    return 0;
}

This should provide output similar to:

125185 and 89187225
 9084 and 9441
 42 and 42

From

知乎

作者：桥头堡
链接：https://www.zhihu.com/question/19801131/answer/27459821
来源：知乎
著作权归作者所有。商业转载请联系作者获得授权，非商业转载请注明出处。

什么是回调函数？
我们绕点远路来回答这个问题。
编程分为两类：系统编程（system programming）和应用编程（application programming）。所谓系统编程，简单来说，就是编写库；而应用编程就是利用写好的各种库来编写具某种功用的程序，也就是应用。系统程序员会给自己写的库留下一些接口，即API（application programming interface，应用编程接口），以供应用程序员使用。所以在抽象层的图示里，库位于应用的底下。
当程序跑起来时，一般情况下，应用程序（application program）会时常通过API调用库里所预先备好的函数。但是有些库函数（library function）却要求应用先传给它一个函数，好在合适的时候调用，以完成目标任务。这个被传入的、后又被调用的函数就称为回调函数（callback function）。
打个比方，有一家旅馆提供叫醒服务，但是要求旅客自己决定叫醒的方法。可以是打客房电话，也可以是派服务员去敲门，睡得死怕耽误事的，还可以要求往自己头上浇盆水。这里，“叫醒”这个行为是旅馆提供的，相当于库函数，但是叫醒的方式是由旅客决定并告诉旅馆的，也就是回调函数。而旅客告诉旅馆怎么叫醒自己的动作，也就是把回调函数传入库函数的动作，称为登记回调函数（to register a callback function）。如下图所示（图片来源：维基百科）：

可以看到，回调函数通常和应用处于同一抽象层（因为传入什么样的回调函数是在应用级别决定的）。而回调就成了一个高层调用底层，底层再回过头来调用高层的过程。（我认为）这应该是回调最早的应用之处，也是其得名如此的原因。

LXVD2X/lxvd2x (Load VSX Vector Doubleword*2 Indexed)

Load VSX Vector Doubleword*2 Indexed

X-form
lxvd2x XT,RA,RB

31 T RA RB 844 TX
0 6 11 16 21 31
if MSR.VSX=0 then VSX_Unavailable()
EA <= RA=0 ? GPR[RB] : GPR[RA] + GPR[RB]
VSR[32×TX+T].dword[0] <= MEM(EA, 8)
VSR[32×TX+T].dword[1] <= MEM(EA+8, 8)

Let XT be the value 32×TX + T.
Let EA be the sum of the contents of GPR[RA], or 0 if RA
is equal to 0, and the contents of GPR[RB].
For each integer value i from 0 to 1, do the following.

When Big-Endian byte ordering is employed, the
contents of the doubleword in storage at address
EA+8×i are placed into load_data in such an order
that;
– the contents of the byte in storage at address
EA+8×i are placed into byte element 0 of
load_data,
– the contents of the byte in storage at address
EA+8×i+1 are placed into byte element 1 of
load_data, and so forth until
– the contents of the byte in storage at address
EA+8×i+7 are placed into byte element 7 of
load_data.
When Little-Endian byte ordering is employed, the
contents of the doubleword in storage at address
EA+8×i are placed into load_data in such an order
that;
– the contents of the byte in storage at address
EA+8×i are placed into byte element 7 of
load_data,
– the contents of the byte in storage at address
EA+8×i+1 are placed into byte element 6 of
load_data, and so forth until

How to use bugpoint reduce a IR test case?

(1) Find the temp files created by clang in the crash report. Usually, it is at the end of the crash report, which
points to a file in the /tmp/ directory. for example:
clang-5.0: note: diagnostic msg: /tmp/tsan_mutexset-bb0974.cpp
clang-5.0: note: diagnostic msg: /tmp/tsan_mutexset-bb0974.sh

(2) add '-S -emit-llvm -o tony.ll' to the script (tsan_mutexset-bb0974.sh in this example) to produce an .ll file.

(3) Then you write a simple script like this:
```#!/bin/bash
<broken-llc> $1 2>&1 | grep "<text of the assertion>" > /dev/null
if [[ $? -eq 0 ]]
then
exit 1
else
exit 0
fi

examples:
    $ cat reduceScript.sh
    ```#!/bin/bash
    /home/jtony/llvm/build/memcpyBootstrapSeanPatch/bin/llc $1 2>&1 | grep "Node emitted out of order - late" > /dev/null
    if [[ $? -eq 0 ]]
    then
       exit 1
    else
       exit 0
    fi

        jyj14ibm

(4)run the following to reduce the given *.ll file once you have both the IR file (tony.ll) and the script (reduceScript.sh):
`$ bugpoint -compile-custom -compile-command=reduceScript.sh tony.ll`

(5) Use llvm-dis to generate the eventual reduced IR (*.ll) file from the bitcode file (*.bc).
For example:
llvm-dis bugpoint-reduced-simplified.bc

Feel free to update this initial bugpoint document.

Wednesday, 22 March 2017

awk 数据流处理工具 (b)

用样式对awk处理的行进行过滤

awk 'NR < 5' #行号小于5

awk 'NR==1,NR==4 {print}' file #行号1-4 (i.e, 1,2,3,4) 打印出来
awk 'NR==1;NR==4 {print}' file #行号1 and 4的打印出来

awk '/linux/' #包含linux文本的行（可以用正则表达式来指定，超级强大,don't forget the two //)

awk '!/linux/' #不包含linux文本的行

设置定界符

使用-F来设置定界符（默认为空格）

awk -F: '{print $NF}' /etc/passwd

读取命令输出

使用getline，将外部shell命令的输出读入到变量cmdout中；

echo | awk '{"grep root /etc/passwd" | getline cmdout; print cmdout }'

在awk中使用循环

for (( i=2; i <= $max; ++i )); do echo "$i"; done
The above is just a normal forloop.

TO INVESTIGATE

Tuesday, 21 March 2017

Variance and standard deviation

s = (s1+s2+ ...sn)/n
Variance =[ (s1 - s)^2 + (s2 - s)^2 + ... (sn - s)^2]/n
Standard Deviation = Variance^(1/2)
In excel: Variance is calculated by the function VAR(C1:Cn), and Standard Deviation
is calculated by STDEV(C1:Cn)

STDEV.S assumes that its arguments are a sample of the population. If your data represents the entire population, then compute the standard deviation using STDEV.P.

If I remember correctly, STDEV.S in invented to get a more accurate result for sampling from a large pool, when you don't have the whole population or samples.

http://www.mathsisfun.com/data/standard-deviation.html

awk 数据流处理工具 (a)

awk脚本结构

awk ' BEGIN{ statements } statements2 END{ statements } '

工作方式

1.执行begin中语句块；

2.从文件或stdin中读入一行，然后执行statements2，重复这个过程，直到文件全部被读取完毕；

3.执行end语句块；

print 打印当前行

使用不带参数的print时，会打印当前行;

echo -e "line1\nline2" | awk 'BEGIN{print "start"} {print } END{ print "End" }'

print 以逗号分割时，参数以空格定界;

echo | awk ' {var1 = "v1" ; var2 = "V2"; var3="v3"; \

print var1, var2 , var3; }'

$>v1 V2 v3

使用-拼接符的方式（""作为拼接符）;

echo | awk ' {var1 = "v1" ; var2 = "V2"; var3="v3"; \

print var1"-"var2"-"var3; }'

$>v1-V2-v3

特殊变量： NR NF $0 $1 $2

NR:表示记录数量，在执行过程中对应当前行号；

NF:表示字段数量，在执行过程总对应当前行的字段数；

$0:这个变量包含执行过程中当前行的文本内容；当前行的所有内容。

$1:第一个字段的文本内容；即第一列的内容。

$2:第二个字段的文本内容；即第二列的内容。
$3:第三个字段的文本内容；即第三列的内容。
... $n: 第n个字段的文本内容；即第n列的内容。

echo -e "line1 f2 f3\n line2 \n line 3" | awk '{print NR":"$0"-"$1"-"$2}'

打印每一行的第二和第三个字段：

awk '{print $2, $3}' file

统计文件的行数：

awk ' END {print NR}' file

累加每一行的第一个字段：

echo -e "1\n 2\n 3\n 4\n" | awk 'BEGIN{num = 0 ;

print "begin";} {sum += $1;} END {print "=="; print sum }'

传递外部变量

var=1000

echo | awk '{print vara}' vara=$var # 输入来自stdin

awk '{print vara}' vara=$var file # 输入来自文件

Monday, 20 March 2017

summarizeSpecTimes.sh

Borrowed from Nemanja, need to learn.

#!/usr/bin/env bash

if [[ -z "$3" ]]     # here -z means: True if string is empty. from 'help test' doc
then
echo "Usage: $0 -o <DIR> Baseline:<SPEC.out.csv> [<NAME>:<SPEC.out.csv> ...]"
echo
echo "This script creates a csv file that contains a summary of multiple SPEC"
echo "run result csv files. The baseline is assumed to be the very first file"
echo "passed in."
echo "The -o option is mandatory and must appear prior to the list of files."
echo
echo "Sample invocation:"
echo "$0 -o Summaries Baseline:CINT2006.110.ref.csv \\"
echo " Baseline:CFP2006.110.ref.csv NoCRBits:CINT2006.112.ref.csv \\"
echo " NoCRBits:CFP2006.112.ref.csv CheapBR:CINT2006.111.ref.csv \\"
echo " CheapBR:CFP2006.111.ref.csv"
echo
echo "Implementation detail: For processing the inputs, the script will"
echo "create a directory for each of the named runs which it will clean"
echo "up after. If this directory happens to contain directories of the"
echo "same name, the script will prompt you before overwriting them."
exit 1
fi

###########################SCRIPT BEGINS ON LINE 78#############################

function summarizeBench {
SUMMARY=""
if [ $(cat CurrBenchRunTimes.txt | wc -l) -eq 1 ]
then
    cat CurrBenchRunTimes.txt CurrBenchRunTimes.txt > tmpCurrBenchRunTimes.txt
    mv tmpCurrBenchRunTimes.txt CurrBenchRunTimes.txt
fi
while read RT
do
    SUMMARY="$SUMMARY $RT"
done < CurrBenchRunTimes.txt
echo $1,$($SUMMARIZE -a $SUMMARY)
}

function summarizeIndividualFile {
START=0
PREV_BENCH=""
while IFS=, read BENCH REF_T RUN_T RATIO REST
do
    if [ "$BENCH" = Benchmark ]
    then
      START=1
      continue
    fi
    if [ $START -ne 1 ]
    then
      continue
    fi
    if echo $REST | grep ',NR,' > /dev/null
    then
      continue
    fi
    if [[ -z "$BENCH" ]]
    then
      summarizeBench $PREV_BENCH
      break
    fi
    if [ "$BENCH" = "$PREV_BENCH" ]
    then
      echo $RUN_T >> CurrBenchRunTimes.txt
    else
      if [[ -n "$PREV_BENCH" ]]
      then
        summarizeBench $PREV_BENCH
      fi
      PREV_BENCH=$BENCH
      echo $RUN_T > CurrBenchRunTimes.txt
    fi
done < $FILE_TO_READ
}

function addNamedSummary {
echo "$(head -1 $OUTDIR/FinalSPECSummary.csv),$1(Median),$1(Best),$1(Worst),$1(%Variance),$1(%Diff(Median)),$1(%Diff(Best)),$1(%Diff(Worst))" > tmpSPECSummarizer.txt
cat $1/* | while IFS=, read BENCH MEDIAN BEST WORST VARIANCE
    do
      BASE_LINE=$(grep ^$BENCH $OUTDIR/FinalSPECSummary.csv)
      BASE_MEDIAN=$(echo $BASE_LINE | cut -f2 -d,)
      BASE_BEST=$(echo $BASE_LINE | cut -f3 -d,)
      BASE_WORST=$(echo $BASE_LINE | cut -f4 -d,)

      DIFF_MEDIAN=$($SUMMARIZE -d $BASE_MEDIAN $MEDIAN)
      DIFF_BEST=$($SUMMARIZE -d $BASE_BEST $BEST)
      DIFF_WORST=$($SUMMARIZE -d $BASE_WORST $WORST)

      echo $(grep ^$BENCH $OUTDIR/FinalSPECSummary.csv),$MEDIAN,$BEST,$WORST,$VARIANCE,$DIFF_MEDIAN,$DIFF_BEST,$DIFF_WORST
    done >> tmpSPECSummarizer.txt
    mv tmpSPECSummarizer.txt $OUTDIR/FinalSPECSummary.csv
}

function cleanupIfNeeded {
grep $RUN_NAME SPECSummarizerDirectories.txt > /dev/null
UNSEEN_DIR=$?
if [ $UNSEEN_DIR -ne 0 ]
then
    echo $RUN_NAME >> SPECSummarizerDirectories.txt
    ls $RUN_NAME > /dev/null 2>&1
    if [ $? -eq 0 ]
    then
      echo "Directory $RUN_NAME already exists. Overwrite (Y/N)?"
      read ANS<&1
      if echo "$ANS" | grep -i ^y
      then
        echo Overwriting...
        rm -Rf $RUN_NAME
      else
        exit 1
      fi
    fi
fi
}

################################SCRIPT BEGINS###################################
if [ "$1" != "-o" ]
then
echo "The -o option is mandatory as the first argument."
exit 1
fi
shift
OUTDIR=$1
shift
ls $OUTDIR > /dev/null 2>&1 || mkdir $OUTDIR
if [[ $? -ne 0 ]]
then
echo "Unable to create directory '$OUTDIR' that you specified as the output directory."
exit 1
fi

# Build the summarizer executable
if which summarize >/dev/null 2>&1
then
SUMMARIZE=$(which summarize)
else
START_AT=$(grep -n '^#include' $0 | head -1 | cut -f1 -d:)
END_AT=$(cat $0 | wc -l)
CPROG_LINES=$(expr $END_AT - $START_AT)
((CPROG_LINES += 1))
tail -$CPROG_LINES $0 > /tmp/summarize.cpp
g++ /tmp/summarize.cpp -o summarize
if [[ $? -ne 0 ]]
then
    rm -f /tmp/summarize.cpp
    exit 1
fi
SUMMARIZE=./summarize
fi

rm -f SPECSummarizerDirectories.txt 2>/dev/null
# Summarize each of the individual files and put the summaries in separate dirs
while [[ -n "$1" ]]
do
touch SPECSummarizerDirectories.txt
FILE_TO_READ=${1#*:}
RUN_NAME=${1%:*}
cleanupIfNeeded
mkdir $RUN_NAME 2>/dev/null
grep $RUN_NAME SPECSummarizerDirectories.txt > /dev/null || echo $RUN_NAME >> SPECSummarizerDirectories.txt
summarizeIndividualFile > $RUN_NAME/$FILE_TO_READ.SPECSummarizerSummary.txt
echo "$RUN_NAME" > $OUTDIR/$RUN_NAME.$FILE_TO_READ
cat $FILE_TO_READ >> $OUTDIR/$RUN_NAME.$FILE_TO_READ
shift
done

echo "Benchmark,Baseline(Median),Baseline(Best),Baseline(Worst),Baseline(%Variance)" > $OUTDIR/FinalSPECSummary.csv
cat $(head -1 SPECSummarizerDirectories.txt)/* >> $OUTDIR/FinalSPECSummary.csv

# Combine all the individual summaries into one csv file
I=0
cat SPECSummarizerDirectories.txt | while read DIR
do
    ((I += 1))
    if [ $I -eq 1 ]
    then
      continue
    fi
# The first one is the baseline, skip it
    echo Summarizing $DIR
    addNamedSummary $DIR
done

# Add all the individual run summary files to the full summary
cat SPECSummarizerDirectories.txt | while read DIR
do
    echo "$DIR" >> $OUTDIR/FinalSPECSummary.csv
    echo "Benchmark,mean,best,worst,variance" >> $OUTDIR/FinalSPECSummary.csv
    cat $DIR/* >> $OUTDIR/FinalSPECSummary.csv
done
# Clean up
rm -Rf $(cat SPECSummarizerDirectories.txt)
rm -f /tmp/summarize.cpp ./summarize ./CurrBenchRunTimes.txt SPECSummarizerDirectories.txt
echo "Result is in file $OUTDIR/FinalSPECSummary.csv"

################################SCRIPT ENDS#####################################
exit 0
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <vector>
#include <algorithm>

double getMedian(std::vector<double> &Vec) {
std::sort(Vec.begin(), Vec.end());
int Size = Vec.size();
if (!Size) return 0.;
if (Size % 2)
    return Vec[Size/2];
return (Vec[Size/2] + Vec[Size/2-1]) / 2;
}

// Assume the vector is sorted at this point
double getVariance(const std::vector<double> &Vec, double Median) {
double Min = Vec[0];
double Max = Vec[Vec.size()-1];
return (Max - Min) / Median * 100.;
}

int main(int argc, const char **argv) {
if (argc < 4) {
    fprintf(stderr, "Usage: %s [opt] <time1> <time2> [<time3>...]\n", argv[0]);
    fprintf(stderr, "       opt is one of -a or -d (for all or diff).\n");
    fprintf(stderr, "       The output for -a is: median,best,worst,variance.\n");
    fprintf(stderr, "       The output for -d is: ((time2-time1)/time1*100)%%.\n");
    return 1;
}

// Computing the diff
if (!strcmp(argv[1], "-d")) {
    double T1 = strtod(argv[2], NULL);
    double T2 = strtod(argv[3], NULL);
    printf("%.2f%%\n", (T2-T1)/T1*100);
    return 0;
} else if (strcmp(argv[1], "-a")) {
    fprintf(stderr, "Unrecognized option %s\n", argv[1]);
    return 1;
}

std::vector<double> RTSet;
int i = 2;
for (; i < argc; i++) {
    RTSet.push_back(strtod(argv[i], NULL));
}
double Median = getMedian(RTSet);
printf("%.4f,%.4f,%.4f,%.2f%%\n", Median, RTSet[0], RTSet[RTSet.size()-1],
         getVariance(RTSet, Median));
return 0;
}

Sunday, 19 March 2017

sed 文本替换利器

首处替换

sed 's/text/replace_text/' file //替换每一行的第一处匹配的text

全局替换

sed 's/text/replace_text/g' file

默认替换后，输出替换后的内容，如果需要直接替换原文件,使用-i：

sed -i 's/text/repalce_text/g' file

移除空白行（here ^ 代表行头，$代表行尾）：

sed '/^$/d' file

变量转换

已匹配的字符串通过标记&来引用.

$ echo this is a test line | sed 's/\w\+/[&]/g'
[this] [is] [a] [test] [line]

实际测试结果，并未加上[]，需要调研原因：
$ echo this is en example | sed 's/\w+/[&]/g'
this is en example

子串匹配标记

第一个匹配的括号内容使用标记 \1 来引用

sed 's/hello$[0-9]$/\1/'

双引号求值

sed通常用单引号来引用；也可使用双引号，使用双引号后，双引号会对表达式求值：

sed 's/$var/HLLOE/'

当使用双引号时，我们可以在sed样式和替换字符串中指定变量；

eg:

p=patten

r=replaced

echo "line con a patten" | sed "s/$p/$r/g"

$>line con a replaced

其它示例

字符串插入字符：将文本中每行内容（PEKSHA）转换为 PEK/SHA

sed 's/^.\{3\}/&\//g' file

paste 按列拼接文本

将两个文本按列拼接到一起;

cat file1

1

2

cat file2

colin

book

paste file1 file2

1 colin

2 book

默认的定界符是制表符，可以用-d指明定界符

paste file1 file2 -d ","

1,colin

2,book

Friday, 17 March 2017

performance analysis tool: Perf

Sampling with `perf record`

The perf tool can be used to collect profiles on per-thread, per-process and per-cpu basis.
There are several commands associated with sampling: record, report, annotate. You must first collect the samples using perf record. This generates an output file called perf.data. That file can then be analyzed, possibly on another machine, using the perf report and perf annotate commands. The model is fairly similar to that of OProfile.

Thursday, 16 March 2017

To do list

To learn:
(1) VIM Macro
(2) summarizeSpecTimes.sh

Wednesday, 15 March 2017

STH/sth (Store Halfword)

Store Halfword D-form
sth RS,D(RA)
if RA = 0 then b .. 0
else b .. (RA)
EA .. b + EXTS(D)
MEM(EA, 2) .. (RS)48:63
Let the effective address (EA) be the sum (RA|0)+ D.
(RS)48:63 are stored into the halfword in storage
addressed by EA.
Special Registers Altered:
None

Wednesday, 12 July 2017

Background

Types

Data hazards

Read after write (RAW)

Example

Write after read (WAR)

Example

Write after write (WAW)

Example

Structural hazards

Control hazards (branch hazards)

Thursday, 6 July 2017

Wednesday, 14 June 2017

Monday, 12 June 2017

Thursday, 1 June 2017

Wednesday, 31 May 2017

Tuesday, 30 May 2017

Wednesday, 24 May 2017

Tuesday, 23 May 2017

Thursday, 18 May 2017

Saturday, 6 May 2017

Monday, 27 March 2017

Sunday, 26 March 2017

Saturday, 25 March 2017

Thursday, 23 March 2017

C Callbacks

Wednesday, 22 March 2017

Tuesday, 21 March 2017

Monday, 20 March 2017

Sunday, 19 March 2017

Friday, 17 March 2017

Sampling with perf record

Thursday, 16 March 2017

Wednesday, 15 March 2017

Sampling with `perf record`