Friday, 14 September 2018

C++ Escape sequences


Escape
sequence
Description Representation
\' single quote byte 0x27 in ASCII encoding
\" double quote byte 0x22 in ASCII encoding
\? question mark byte 0x3f in ASCII encoding
\\ backslash byte 0x5c in ASCII encoding
\a audible bell byte 0x07 in ASCII encoding
\b backspace byte 0x08 in ASCII encoding
\f form feed - new page byte 0x0c in ASCII encoding
\n line feed - new line byte 0x0a in ASCII encoding
\r carriage return byte 0x0d in ASCII encoding
\t horizontal tab byte 0x09 in ASCII encoding
\v vertical tab byte 0x0b in ASCII encoding
\nnn arbitrary octal value byte nnn
\xnn arbitrary hexadecimal value byte nn
\unnnn universal character name
(arbitrary Unicode value);
may result in several characters
code point U+nnnn
\Unnnnnnnn universal character name
(arbitrary Unicode value);
may result in several characters
code point U+nnnnnnnn     




For example:
The following source skips to the start of the next page. (Applies mostly to terminals where the output device is a printer rather than a VDU (Virtual Display Unit))


#include <iostream>
int main()
{
   std::cout << "hello\fgoodbye" << std::endl;  
}


Tuesday, 14 August 2018

C语言snprintf()函数:将格式化的数据写入字符串—sprintf()

函数名:vsnprintf
原型:int _vsnprintf(char *buffer, size_t max_count, const char *format, va_list vArgList);

功能:类同vsprintf,加了max_count限制。
参数:
返回值:如果成功调用此函数,返回写到buffer中的字符的个数(不包括结尾的'\0')。snprintf和vsnprintf函数不能够写多于size大小(包括结尾的'0')的字节数。如果输出因为以上原因被截
断,返回成功写入buffer的字符数(不包括结尾的'\0'),如果有足够的内存空间的话。所以,如果返回值等于size或者大于size,表示输出到buffer的字符被截断,如果输出过程中遇到错误,则
返回一个负数。

函数名:vsprintf

原型:int vsprintf(char *string, char *format, va_list param);
功能:将param 按格式format写入字符串string中。
参数:va_list可变参数
返回值:正常情况下返回生成字串的长度(除去\0),错误情况返回负值

把data type 转换成为string 类型,存储到buffer 中.

函数名:snprintf

原型:int snprintf(char *str, size_t size, const char *format, ...);

功能:将可变个参数(...)按照format格式化成字符串,然后将其复制到str中,返回写入str中的字符串的长度,所以可以利用snprintf函数来提前获取需要的内存空间大小。

参数:
返回值:函数返回值:若成功则返回欲写入的字符串长度,若出错则返回负值


事实上,LLVM C++库函数ostream 就是使用__libcpp_snprintf_l (wrapper of snprintf), 我因为没有把该函数所在的头文件 include/__bsd_locale_fallbacks.h 包含到不同的头文件导致数据转出string (ostream)再转换回来(istream),值就变了。 因为我们总是在调用先编译的ASCII 版本的__libcpp_snprintf_l 导致,EBCDIC 版本被linker silently 丢弃了(stupid 设计)。  把__libcpp_snprintf_l  放到不同的namespace 里面(ASCII or EBCDIC)之后,从新便于(local.ebcdic.o local.ascii.o 分别包含ASCII 和EBCDIC 版本的__libcpp_snprintf_l 问题解决!!!!!!)

经验总结: gdb 神器啊,没有它debug code 到死啊。能用gdb绝对不printf 啊!!! 

Thursday, 12 July 2018

Strong and Weak Symbols in GCC GCC里面的强符号和弱符号

Strong and Weak Symbols in GCC

GCC里面的强符号和弱符号

Motivation

 

 
Global variables are powerful but have the risk of being altered carelessly. Under most cases, we can add static modifier on this global variables such that these variables can only be altered in the file. However, there do have some situations that we have to use global variables across different files. In this case, we usually encounter an error happened in linking phase, i.e., error:multiple definition. Some of these errors are obvious and easily to debug while others can be really puzzling. Here I will give you an example that I encountered.
Suppose we have two source files, and the content is
1
2
3
4
5
6
7
#include <stdio.h>
// main.c
int gvar;
int main(){
printf("shared var is %d\n",gvar);
return 0;
}
1
2
// aux.c
int gvar = 5;
then we run the command
1
gcc -o test main.c test.c
and everything goes smoothly. That is to say, the code presented is correct, variable gvar is not multiple definition. Note that there is no extern modifier for gvar and the result of this program is 5, which means that the variable gvar is shared across two files. This is quite strange since we know that if two global variable have the same name in a project, it will incur multiple definition errors.
What is more, I found something more interesting. When we change the postfix of these two files,
1
2
3
mv main.c main.cpp
mv test.c test.cpp
gcc -o test test.cpp main.cpp #error:multiple definition
the compiler told me that there is a multiple definition error for variable gvar. How does this happen, the content in the files are not changed and we only changed the file name. An intuitive explanation is that the different of c++ and c cause the puzzling bug since .cpp is the file type for c++ and .c is for c.

Strong and weak symbols

Actually, these strange phenomenons are all caused by one features provided by GCC, called strong and weak symbols. For global variables, it was divided into three types.
  • initialized to a non-zero value
  • initialized to zero
  • not initialized, just defined
In GCC, the first two types of global variables is called strong symbols that are store in .DATA and .BSS section. As for the third type, it is called weak symbols, and it is saved in .COMMON section.
There are three rules that must be followed for these variables
  • only one strong symbol is allowed with the same name
  • if there exists one strong symbol and several weak symbols, the weak symbols are overrode by strong symbols
  • if there exists several weak symbols, GCC will choose one that have the largest size (memory occupation).
Now we can clarify why the c version program can run without any errors. In aux.c, we define a strong symbol gvar and it is initialized to 5. In main.c, we only define the variable gvar, and it is a weak symbol. When we compile the program using GCC, the gvar in main.c is overrode by gvar in aux.c according to the second rule. Therefore, the program runs smoothly and the result is 5. If we change the main.c as follows, it will incur multiple definition also.
1
2
3
4
5
6
7
#include <stdio.h>
// main.c
int gvar=0; // this is a strong symbol
int main(){
printf("shared var is %d\n",gvar);
return 0;
}
Wait, there is still one puzzling problem left. Why the program incurs multiple definition error when the file name is changed ?
Actually, when we change the file type from .c to .cpp, the GCC compiler will use the rules for c++ problem to compile this c program. Therefore, to answer this question, we need to investigate the difference when GCC handle the strong/weak symbol between .cpp and .c.
Here is my conclusion. For c program, if you define an global variable and not initialize it, GCC will regard it as weak symbol. However, for c++ program, the default type is strong variable. That is to say, for line int gvar; in main.cpp, it is a strong symbol. Since we have another strong symbol with the same name in aux.cpp, the compiler gives the error.
If you want to use weak symbol in a c++ program, you need to explicitly declare the variable is weak. For example, if we write a c++ program like this,
1
2
3
4
5
6
7
#include <stdio.h>
// main.cpp
int __attribute__((weak)) gvar=2;
int main(){
printf("shared var is %d\n",gvar);
return 0;
}
1
2
// aux.cpp
int gvar = 5;
the program will have the same behavior like the c version.
To avoid the bugs like that, we can use the -fno-common option provided by GCC, it will regard all variables as strong symbols. However, in some cases, we have to use weak symbols (see next section). Therefore, we should develop a good coding habit. There are three rules we can follow,
  • eliminate all global variables (hard)
  • add static modifier for global variables, provide interfaces for accesses (medium)
  • initialize all global variables, such as zero (easy)

Function of s. w. symbols

It seems that we should use strong symbols instead of weak symbols in programming, so why does GCC provide weak symbols? As far as I known, weak symbols are useful for library functions. For example, if the symbols in library are weak symbols, users can easily override some library functions for personal objectives. What’s more, programmers can declare some weak symbols of library functions. If the program is linked with the library, program can provide more powerful features, Otherwise, the program can still run without any errors. Here is a simple example.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#include <stdio.h>
#include <pthread.h>
__attribute__((weak)) int pthread_create( pthread_t*, const pthread_attr_t*,
void*(*)(void*), void*);
int main()
{
if (pthread_create)
{
printf("This is multi-thread version!\n");
}
else
{
printf("This is single-thread version!\n");
}
return 0;
}
If the program is not linked with pthread library, it will run in single-thread mode. Otherwise, it can run in multi-thread mode.

Manage your global variables

If you have to use global variables, here is an way to manage your global variables in an comfortable way. Create two files called global_var.h and global_var.c. Declare all global variables using extern modifier in global_var.h. Initialize all global variables in global_var.c. For instance,
1
2
3
4
5
6
// global_var.h
#ifndef GLOBAL_VAR
#define GLOBAL_VAR
extern int g_A;
extern char g_B;
#endif
1
2
3
// global_var.c
int g_A = 0;
char g_B = 'g';
When you need to use global variables in other files, such as main.c, simple include global_var.h and you will be able to access all global variables.
1
2
3
4
5
6
7
// main.c
#include <stdio.h>
#include <global_var.h>
int main(){
printf("var is %d\n",g_A);
return 0;
}
Through this way, you can easily manage your global variables. However, be sure to use global variables as less as possible.


Tuesday, 29 May 2018

Advanced templates

https://github.com/wuye9036/CppTemplateTutorial
chap1  无他, 唯手熟尔

Reading notes: till 2.3 即用即推导 (2018.05.29)

Thursday, 24 May 2018

Implement Case Insensitive string by overwriting C++ basic_string template traits parameter

include<string>
#include<iostream>
using namespace std;
class CaseInsensitiveTraits: public std::char_traits<char> {
public:
    static bool lt (char one, char two) {
        return std::tolower(one) < std::tolower(two);
    }

    static bool eq (char one, char two) {
        return std::tolower(one) == std::tolower(two);
    }

    static int compare (const char* one, const char* two, size_t length) {
        for (size_t i = 0; i < length; ++i) {
            if (lt(one[i], two[i])) return -1;
            if (lt(two[i], one[i])) return +1;
        }
        return 0;
    }
};

typedef basic_string<char, CaseInsensitiveTraits> cistring;

int main(void) {
    cistring c1 = "HI!", c2 = "hi!";
    if (c1 == c2) {  // Always true
        cout << "Strings are equal." << endl;
    }

    cout << "c1 length is " << c1.size() << endl;
    cout << "Find 'I' in c1 length: " << c1.find('I') << endl;
    cout << "Find 'h' in c1 length: " << (int) c1.find('h') << endl;
}

Wednesday, 12 July 2017

Hazard (computer architecture)

https://en.wikipedia.org/wiki/Hazard_(computer_architecture)


Background

Instructions in a pipelined processor are performed in several stages, so that at any given time several instructions are being processed in the various stages of the pipeline, such as fetch and execute. There are many different instruction pipeline microarchitectures, and instructions may be executed out-of-order. A hazard occurs when two or more of these simultaneous (possibly out of order) instructions conflict.

Types

Data hazards

Data hazards occur when instructions that exhibit data dependence modify data in different stages of a pipeline. Ignoring potential data hazards can result in race conditions (also termed race hazards). There are three situations in which a data hazard can occur:
  1. read after write (RAW), a true dependency
  2. write after read (WAR), an anti-dependency
  3. write after write (WAW), an output dependency
Consider two instructions i1 and i2, with i1 occurring before i2 in program order.

Read after write (RAW)

(i2 tries to read a source before i1 writes to it) A read after write (RAW) data hazard refers to a situation where an instruction refers to a result that has not yet been calculated or retrieved. This can occur because even though an instruction is executed after a prior instruction, the prior instruction has been processed only partly through the pipeline.
Example
For example:
i1. R2 <- R1 + R3
i2. R4 <- R2 + R3

The first instruction is calculating a value to be saved in register R2, and the second is going to use this value to compute a result for register R4. However, in a pipeline, when operands are fetched for the 2nd operation, the results from the first will not yet have been saved, and hence a data dependency occurs.
A data dependency occurs with instruction i2, as it is dependent on the completion of instruction i1.

Write after read (WAR)

(i2 tries to write a destination before it is read by i1) A write after read (WAR) data hazard represents a problem with concurrent execution.
Example
For example:
i1. R4 <- R1 + R5
i2. R5 <- R1 + R2

In any situation with a chance that i2 may finish before i1 (i.e., with concurrent execution), it must be ensured that the result of register R5 is not stored before i1 has had a chance to fetch the operands.

Write after write (WAW)

(i2 tries to write an operand before it is written by i1) A write after write (WAW) data hazard may occur in a concurrent execution environment.
Example
For example:
i1. R2 'R2 <- R4 + R7 i2. R2 <- R1 + R3
The write back (WB) of i2 must be delayed until i1 finishes executing.

Structural hazards

A structural hazard occurs when a part of the processor's hardware is needed by two or more instructions at the same time. A canonical example is a single memory unit that is accessed both in the fetch stage where an instruction is retrieved from memory, and the memory stage where data is written and/or read from memory.[3] They can often be resolved by separating the component into orthogonal units (such as separate caches) or bubbling the pipeline.

Control hazards (branch hazards)

Branching hazards (also termed control hazards) occur with branches. On many instruction pipeline microarchitectures, the processor will not know the outcome of the branch when it needs to insert a new instruction into the pipeline (normally the fetch stage).

What is idempotent?

Idempotent
Math def:  ƒ(ƒ(x)) ≡ ƒ(x)
中文: 幂等

例如 max (x,x)= max(max(x,x),x)