Tuesday, 3 January 2017

Shell Script 15 Regular Expressions - User Guide (c)

POSIX Character Class Definitions

POSIX 1003.2 section 2.8.3.2 (6) defines a set of character classes that denote certain common ranges. They tend to look very ugly but have the advantage that also take into account the 'locale', that is, any variant of the local language/coding system. Many utilities/languages provide short-hand ways of invoking these classes. Strictly the names used and hence their contents reference the LC_CTYPE POSIX definition (1003.2 section 2.5.2.1).

Value

Meaning

[:upper:]Any alpha character A to Z.
[:lower:]Any alpha character a to z.
[:digit:]Only the digits 0 to 9
[:blank:]Space, TAB characters only.
[:xdigit:]Hexadecimal notation 0-9, A-F, a-f.
[:punct:]Punctuation symbols . , " ' ? ! ; : # $ % & ( ) * + - / < > = @ [ ] \ ^ _ { } | ~
[:cntrl:]Control Characters NL CR LF TAB VT FF NUL SOH STX EXT EOT ENQ ACK SO SI DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC IS1 IS2 IS3 IS4 DEL.
[:space:]Any whitespace characters (space, tab, NL, FF, VT, CR). Many system abbreviate as \s.
[:alnum:]Any alphanumeric character 0 to 9 OR A to Z or a to z (the set defined by upper, lower and digit)
[:alpha:]Any alpha character A to Z or a to z (the set defined by upper and lower).
[:print:]Any printable character (set defined by alnum and punct) plus the single character SPACE.
[:graph:]Any printable characters (set defined by alnum and punct) but excludes the single character SPACE. Many system abbreviate as \W.
These are always used inside square brackets in the form [[:alnum:]] or combined as [[:digit:]a-d]

No comments:

Post a Comment