Sed Regular Expressions: Difference between revisions
(29 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
=External= | |||
* https://www.gnu.org/software/sed/manual/html_node/Regular-Expressions.html | |||
=Internal= | =Internal= | ||
Line 4: | Line 6: | ||
* [[Regular Expressions]] | * [[Regular Expressions]] | ||
=Special Characters (need to be escaped in regular expressions)= | =<span id='Special_Characters_.28need_to_be_escaped_in_regular_expressions.29'></span>Meta-Characters - Special Characters (need to be escaped in regular expressions)= | ||
<font size=-2> | |||
/ | |||
" | |||
$ # unescaped signifies end of line | |||
^ # unescaped signifies the beginning of a line | |||
! | |||
[ | |||
] | |||
: | |||
* # zero or more | |||
. # dot | |||
</font> | |||
Single quote is a special case, to match it use its ASCII hexadecimal value prefixed by \x as follows, instead of escaping it: | |||
<font size=-2> | |||
\x27 | |||
</font> | |||
To use () for grouping, they need to be escaped: | |||
<font size=-2> | |||
\(...\) | |||
</font> | |||
More details in [[#Grouping|Grouping]] below. | |||
< | =Non-Special Characters (do not need to be escaped in regular expressions)= | ||
<font size=-2> | |||
< | |||
> | |||
( | |||
) | |||
! | |||
</ | - | ||
{ | |||
} | |||
</font> | |||
= | ==<tt>+</tt>== | ||
<code>+</code> by itself is not a meta-character, it matches "+". | |||
< | The GNU version matches <code>\+</code> with "one or more characters" | ||
</ | |||
=Grouping= | =Grouping= | ||
Line 30: | Line 51: | ||
Use <tt>\(</tt> and <tt>\)</tt> for grouping. Parentheses must be escaped to be interpreted as grouping separator. | Use <tt>\(</tt> and <tt>\)</tt> for grouping. Parentheses must be escaped to be interpreted as grouping separator. | ||
=Brackets= | |||
Brackets mean "any one of" | |||
<font size=-2> | |||
[ab] | |||
</font> | |||
will match "a" or "b". | |||
<syntaxhighlight lang='bash'> | |||
echo "blah" | sed -s 's/[ab]/x/g' | |||
</syntaxhighlight> | |||
prints "xlxh". | |||
==Brackets and Negation== | |||
Match everything except the specified characters. More than one characters is matched. The behavior is different from the [[Bash_Patterns#Negation|behavior of bash patterns on negation]]: | |||
<font size=-2> | |||
[^abc]* | |||
</font> | |||
=Match Zero or One Character= | |||
Normally, this would be achieved with <code>?</code> placed after the character or the group of characters in questions, but this not work with standard <code>sed</code>. | |||
=Examples= | =Examples= | ||
Line 35: | Line 80: | ||
Match everything except space: | Match everything except space: | ||
< | <font size=-2> | ||
[^ ]* | |||
</ | </font> | ||
< | <font size=-2> | ||
.* | |||
</ | </font> | ||
seems to work too. | seems to work too. | ||
Line 47: | Line 92: | ||
Words (digits, alpha, _): | Words (digits, alpha, _): | ||
< | <font size=-2> | ||
sed -e 's/[0-9a-zA-Z_]*/THIS_WAS_A_WORD/g' | sed -e 's/[0-9a-zA-Z_]*/THIS_WAS_A_WORD/g' | ||
</ | </font> | ||
Blank spaces (spaces, tabs, newlines): <tt>\s</tt> does not seem to work. | Blank spaces (spaces, tabs, newlines): <tt>\s</tt> does not seem to work. | ||
=Regular Expression Syntax= | =Regular Expression Syntax= | ||
<font color= | <font color=darkkhaki>TO NORMALIZE across [[Java_Regular_Expressions#Regular_Expression_Syntax|java Regular Expression Syntax]], [[Grep_Regular_Expressions#Regular_Expression_Syntax|grep Regular Expression Syntax]], [[Sed_Regular_Expressions#Regular_Expression_Syntax|sed Regular Expression Syntax]].</font> |
Latest revision as of 21:46, 1 April 2024
External
Internal
Meta-Characters - Special Characters (need to be escaped in regular expressions)
/ " $ # unescaped signifies end of line ^ # unescaped signifies the beginning of a line ! [ ] : * # zero or more . # dot
Single quote is a special case, to match it use its ASCII hexadecimal value prefixed by \x as follows, instead of escaping it:
\x27
To use () for grouping, they need to be escaped:
\(...\)
More details in Grouping below.
Non-Special Characters (do not need to be escaped in regular expressions)
< > ( ) ! - { }
+
+
by itself is not a meta-character, it matches "+".
The GNU version matches \+
with "one or more characters"
Grouping
Use \( and \) for grouping. Parentheses must be escaped to be interpreted as grouping separator.
Brackets
Brackets mean "any one of"
[ab]
will match "a" or "b".
echo "blah" | sed -s 's/[ab]/x/g'
prints "xlxh".
Brackets and Negation
Match everything except the specified characters. More than one characters is matched. The behavior is different from the behavior of bash patterns on negation:
[^abc]*
Match Zero or One Character
Normally, this would be achieved with ?
placed after the character or the group of characters in questions, but this not work with standard sed
.
Examples
Match everything except space:
[^ ]*
.*
seems to work too.
Words (digits, alpha, _):
sed -e 's/[0-9a-zA-Z_]*/THIS_WAS_A_WORD/g'
Blank spaces (spaces, tabs, newlines): \s does not seem to work.
Regular Expression Syntax
TO NORMALIZE across java Regular Expression Syntax, grep Regular Expression Syntax, sed Regular Expression Syntax.