Bash Patterns: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
 
(14 intermediate revisions by the same user not shown)
Line 8: Line 8:
* [[bash Parameter and Variable Expansion]]
* [[bash Parameter and Variable Expansion]]
* [[Regular Expressions#Subjects|Regular Expressions]]
* [[Regular Expressions#Subjects|Regular Expressions]]
=Overview=
Bash patterns are a simple regular expression language defined by the [[#Metacharacters|metacharacters]] and rules described below.
Variable expansion can be used inside patterns:
<syntaxhighlight lang='bash'>
a="something"
b="some"
echo ${a#${b}}
</syntaxhighlight>
prints "thing".


=Metacharacters=
=Metacharacters=
Line 24: Line 38:


Matches any one of the enclosed characters.  
Matches any one of the enclosed characters.  
For example:
[- :]
matches '-', space and ':'.


A pair of characters separated by a hyphen denotes a range expression; any character that falls between those two characters, inclusive, using the current locale’s collating sequence and character set, is matched. A digit is matches as follows:
A pair of characters separated by a hyphen denotes a range expression; any character that falls between those two characters, inclusive, using the current locale’s collating sequence and character set, is matched. A digit is matches as follows:


  [0-9]
  [0-9]
If the first character following the ‘[’ is a ‘!’ or a ‘^’ then any character not enclosed is matched.


A ‘-’ may be matched by including it as the first or last character in the set.  
A ‘-’ may be matched by including it as the first or last character in the set.  
Line 35: Line 51:
A ‘]’ may be matched by including it as the first character in the set.  
A ‘]’ may be matched by including it as the first character in the set.  


The sorting order of characters in range expressions is determined by the current locale and the values of the LC_COLLATE and LC_ALL shell variables, if set. For example, in the default C locale, ‘[a-dx-z]’ is equivalent to [abcdxyz]. Many locales sort characters in dictionary order, and in these locales [a-dx-z]’ is typically not equivalent to ‘[abcdxyz]’; it might be equivalent to ‘[aBbCcDdxXyYz]’, for example. To obtain the traditional interpretation of ranges in bracket expressions, you can force the use of the C locale by setting the LC_COLLATE or LC_ALL environment variable to the value ‘C’, or enable the globasciiranges shell option.
===Negation===
 
If the first character following the ‘[’ is a !or a ^then any character not enclosed is matched. The match is for one character only.
 
To match a known number of characters, repeat the [!...] expression the known number of times. [!...]* does not work to match an undefined number of not-enclosed characters; it will match just one non-enclosed character then all the characters. This behavior is different than [[Sed_Regular_Expressions#Negation|sed behavior on negation]].


===Classes===
===Classes===
Line 60: Line 80:
==/==
==/==


/ (slash)
To match a '/' (slash), use '\/'. If used in replacement constructs as [[Bash_Parameter_and_Variable_Expansion#Replace_Strings_in_Place|variable expansion replacement]], there is no need to escape slashes in the replacement string.


==~==
==~==


~ (tilda)
~ (tilda)
=='==
' (single quote) must be escaped to match:
\'
=="==
" (double quote) must be escaped to match:
\"
==;==
; (semicolon) must be escaped to match:
\;
==(...)==
Parentheses must be escaped to match
\(
\)


=Non-Special Characters=
=Non-Special Characters=
Line 72: Line 118:
  \ # forward slash
  \ # forward slash
  . # dot - does NOT match any character, it just matches a dot
  . # dot - does NOT match any character, it just matches a dot
: # column
=Paths=
See [[#.2F|/]] above.
=Pattern Lists=
<syntaxhighlight lang='bash'>
?(pattern-list)
</syntaxhighlight>
Matches zero or one occurrence of the given patterns.
<syntaxhighlight lang='bash'>
*(pattern-list)
</syntaxhighlight>
Matches zero or more occurrences of the given patterns.
<syntaxhighlight lang='bash'>
+(pattern-list)
</syntaxhighlight>
Matches one or more occurrences of the given patterns.
<syntaxhighlight lang='bash'>
@(pattern-list)
</syntaxhighlight>
Matches one of the given patterns.
<syntaxhighlight lang='bash'>
!(pattern-list)
</syntaxhighlight>
Matches anything except one of the given patterns.

Latest revision as of 18:11, 4 April 2021

External

Internal

Overview

Bash patterns are a simple regular expression language defined by the metacharacters and rules described below.

Variable expansion can be used inside patterns:

a="something"
b="some"
echo ${a#${b}}

prints "thing".

Metacharacters

The following characters have a special meaning when a bash pattern is evaluated, and they need to be escaped to be matched literally:

*

* (star) matches any string, including the null string. When the globstar shell option is enabled, and ‘*’ is used in a filename expansion context, two adjacent ‘*’s used as a single pattern will match all files and zero or more directories and subdirectories. If followed by a ‘/’, two adjacent ‘*’s will match only directories and subdirectories.

?

? (question mark) matches a single character. The dot ('.') is not a metacharacter.

[...]

Matches any one of the enclosed characters.

For example:

[- :]

matches '-', space and ':'.

A pair of characters separated by a hyphen denotes a range expression; any character that falls between those two characters, inclusive, using the current locale’s collating sequence and character set, is matched. A digit is matches as follows:

[0-9]

A ‘-’ may be matched by including it as the first or last character in the set.

A ‘]’ may be matched by including it as the first character in the set.

Negation

If the first character following the ‘[’ is a ‘!’ or a ‘^’ then any character not enclosed is matched. The match is for one character only.

To match a known number of characters, repeat the [!...] expression the known number of times. [!...]* does not work to match an undefined number of not-enclosed characters; it will match just one non-enclosed character then all the characters. This behavior is different than sed behavior on negation.

Classes

Within ‘[’ and ‘]’, character classes can be specified using the syntax [:class:], where class is one of the following classes defined in the POSIX standard:

  • alnum
  • alpha
  • ascii
  • blank
  • cntrl
  • digit
  • graph
  • lower
  • print
  • punct
  • space
  • upper
  • word: matches letters, digits, and the character ‘_’.
  • xdigit

A character class matches any character belonging to that class.

/

To match a '/' (slash), use '\/'. If used in replacement constructs as variable expansion replacement, there is no need to escape slashes in the replacement string.

~

~ (tilda)

'

' (single quote) must be escaped to match:

\'

"

" (double quote) must be escaped to match:

\"

;

(semicolon) must be escaped to match
\;


(...)

Parentheses must be escaped to match

\(
\)

Non-Special Characters

These characters do not need to be escaped in bash patterns to match:

\ # forward slash
. # dot - does NOT match any character, it just matches a dot
: # column

Paths

See / above.

Pattern Lists

?(pattern-list)

Matches zero or one occurrence of the given patterns.

*(pattern-list)

Matches zero or more occurrences of the given patterns.

+(pattern-list)

Matches one or more occurrences of the given patterns.

@(pattern-list)

Matches one of the given patterns.

!(pattern-list)

Matches anything except one of the given patterns.