Sed Regular Expressions: Difference between revisions
Line 44: | Line 44: | ||
Use <tt>\(</tt> and <tt>\)</tt> for grouping. Parentheses must be escaped to be interpreted as grouping separator. | Use <tt>\(</tt> and <tt>\)</tt> for grouping. Parentheses must be escaped to be interpreted as grouping separator. | ||
= | =Brackets= | ||
Brackets mean "any one of" | |||
<font size=-2> | |||
[ab] | |||
</font> | |||
will match "a" or "b". | |||
<syntaxhighlight lang='bash'> | |||
echo "blah" | sed -s 's/[ab]/x/g' # prints "xlxh" | |||
</syntaxhighlight lang='bash'> | |||
==Brackets and Negation== | |||
Match everything except the specified characters. More than one characters is matched (this behavior is different from the [[Bash_Patterns#Negation|behavior of bash patterns on negation]]): | Match everything except the specified characters. More than one characters is matched (this behavior is different from the [[Bash_Patterns#Negation|behavior of bash patterns on negation]]): | ||
<font size=-2> | <font size=-2> |
Revision as of 21:41, 1 April 2024
Internal
Meta-Characters - Special Characters (need to be escaped in regular expressions)
/ " $ # unescaped signifies end of line ^ # unescaped signifies the beginning of a line ! [ ] : * # zero or more . # dot
Single quote is a special case, to match it use its ASCII hexadecimal value prefixed by \x as follows, instead of escaping it:
\x27
To use () for grouping, they need to be escaped:
\(...\)
More details in Grouping below.
Non-Special Characters (do not need to be escaped in regular expressions)
< > ( ) ! - { } + # this is interesting, I thought '+' is a meta-character, more experimentation necessary.
Grouping
Use \( and \) for grouping. Parentheses must be escaped to be interpreted as grouping separator.
Brackets
Brackets mean "any one of"
[ab]
will match "a" or "b".
<syntaxhighlight lang='bash'> echo "blah" | sed -s 's/[ab]/x/g' # prints "xlxh" </syntaxhighlight lang='bash'>
Brackets and Negation
Match everything except the specified characters. More than one characters is matched (this behavior is different from the behavior of bash patterns on negation):
[^abc]*
Match Zero or One Character
Normally, this would be achieved with ?
placed after the character or the group of characters in questions, but this not work with standard sed
.
Examples
Match everything except space:
[^ ]*
.*
seems to work too.
Words (digits, alpha, _):
sed -e 's/[0-9a-zA-Z_]*/THIS_WAS_A_WORD/g'
Blank spaces (spaces, tabs, newlines): \s does not seem to work.
Regular Expression Syntax
TO NORMALIZE across java Regular Expression Syntax, grep Regular Expression Syntax, sed Regular Expression Syntax.