Is it possible to escape regex metacharacters reliably with sed -
i'm wondering whether possible write 100% reliable sed command escape regex metacharacters in input string can used in subsequent sed command. this:
#!/bin/bash # trying replace 1 regex in input file sed search="/abc\n\t[a-z]\+\([^ ]\)\{2,3\}\3" replace="/xyz\n\t[0-9]\+\([^ ]\)\{2,3\}\3" # sanitize input search=$(sed 'script escape' <<< "$search") replace=$(sed 'script escape' <<< "$replace") # use in sed command sed "s/$search/$replace/" input i know there better tools work fixed strings instead of patterns, example awk, perl or python. prove whether possible or not sed. let's concentrate on basic posix regexes have more fun! :)
i have tried lot of things anytime find input broke attempt. thought keeping abstract script escape not lead wrong direction.
btw, discussion came here. thought place collect solutions , break and/or elaborate them.
note:
- if you're looking prepackaged functionality based on techniques discussed in answer:
bashfunctions enable robust escaping in multi-line substitutions can found @ bottom of post (plusperlsolution usesperl's built-in support such escaping).- @edmorton's answer contains tool (
bashscript) robustly performs single-line substitutions.
- all snippets assume
bashshell (posix-compliant reformulations possible):
single-line solutions
escaping string literal use regex in sed:
to give credit credit due: found regex used below in this answer.
assuming search string single-line string:
search='abc\n\t[a-z]\+\([^ ]\)\{2,3\}\3' # sample input containing metachars. searchescaped=$(sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"$search") # escape it. sed -n "s/$searchescaped/foo/p" <<<"$search" # if ok, echoes 'foo' - every character except
^placed in own character set[...]expression treat literal.- note
^1 char. cannot represent[^], because has special meaning in location (negation).
- note
- then,
^chars. escaped\^.
the approach robust, not efficient.
the robustness comes not trying anticipate special regex characters - vary across regex dialects - focus on 2 features shared regex dialects:
- the ability specify literal characters inside character set.
- the ability escape literal
^\^
escaping string literal use replacement string in sed's s/// command:
the replacement string in sed s/// command not regex, recognizes placeholders refer either entire string matched regex (&) or specific capture-group results index (\1, \2, ...), these must escaped, along (customary) regex delimiter, /.
assuming replacement string single-line string:
replace='laurel & hardy; ps\2' # sample input containing metachars. replaceescaped=$(sed 's/[&/\]/\\&/g' <<<"$replace") # escape sed -n "s/\(.*\) \(.*\)/$replaceescaped/p" <<<"foo bar" # if ok, outputs $replace multi-line solutions
escaping multi-line string literal use regex in sed:
note: makes sense if multiple input lines (possibly all) have been read before attempting match.
since tools such sed , awk operate on single line @ time default, steps needed make them read more 1 line @ time.
# define sample multi-line literal. search='/abc\n\t[a-z]\+\([^ ]\)\{2,3\}\3 /def\n\t[a-z]\+\([^ ]\)\{3,4\}\4' # escape it. searchescaped=$(sed -e 's/[^^]/[&]/g; s/\^/\\^/g; $!a\'$'\n''\\n' <<<"$search" | tr -d '\n') #' # use in sed command reads input lines front. # if ok, echoes 'foo' sed -n -e ':a' -e '$!{n;ba' -e '}' -e "s/$searchescaped/foo/p" <<<"$search" - the newlines in multi-line input strings must translated
'\n'strings, how newlines encoded in regex. $!a\'$'\n''\\n'appends string'\n'every output line last (the last newline ignored, because added<<<)tr -d '\nremoves actual newlines string (sedadds 1 whenever prints pattern space), replacing newlines in input'\n'strings.
-e ':a' -e '$!{n;ba' -e '}'posix-compliant form ofsedidiom reads all input lines loop, therefore leaving subsequent commands operate on input lines @ once.
escaping multi-line string literal use replacement string in sed's s/// command:
# define sample multi-line literal. replace='laurel & hardy; ps\2 masters\1 & johnson\2' # escape use sed replacement string. ifs= read -d '' -r < <(sed -e ':a' -e '$!{n;ba' -e '}' -e 's/[&/\]/\\&/g; s/\n/\\&/g' <<<"$replace") replaceescaped=${reply%$'\n'} # if ok, outputs $replace is. sed -n "s/\(.*\) \(.*\)/$replaceescaped/p" <<<"foo bar" - newlines in input string must retained actual newlines,
\-escaped. -e ':a' -e '$!{n;ba' -e '}'posix-compliant form ofsedidiom reads all input lines loop.'s/[&/\]/\\&/gescapes&,\,/instances, in single-line solution.s/\n/\\&/g'\-prefixes actual newlines.ifs= read -d '' -rused readsedcommand's output as is (to avoid automatic removal of trailing newlines command substitution ($(...)) perform).${reply%$'\n'}removes single trailing newline,<<<has implicitly appended input.
bash functions based on above (for sed):
quotere()quotes (escapes) use in regexquotesubst()quotes use in substitution string ofs///call.- both handle multi-line input correctly
- note because
sedreads single line @ at time default, use ofquotere()multi-line strings makes sense insedcommands explicitly read multiple (or all) lines @ once. - also, using command substitutions (
$(...)) call functions won't work strings have trailing newlines; in event, useifs= read -d '' -r escapedvalue <(quotesubst "$value")
- note because
# synopsis # quotere <text> quotere() { sed -e 's/[^^]/[&]/g; s/\^/\\^/g; $!a\'$'\n''\\n' <<<"$1" | tr -d '\n'; } # synopsis # quotesubst <text> quotesubst() { ifs= read -d '' -r < <(sed -e ':a' -e '$!{n;ba' -e '}' -e 's/[&/\]/\\&/g; s/\n/\\&/g' <<<"$1") printf %s "${reply%$'\n'}" } example:
from=$'cost\(*):\n$3.' # sample input containing metachars. to='you & i'$'\n''eating a\1 sauce.' # sample replacement string metachars. # should print unmodified value of $to sed -e ':a' -e '$!{n;ba' -e '}' -e "s/$(quotere "$from")/$(quotesubst "$to")/" <<<"$from" note use of -e ':a' -e '$!{n;ba' -e '}' read input @ once, multi-line substitution works.
perl solution:
perl has built-in support escaping arbitrary strings literal use in regex: quotemeta() function or equivalent \q...\e quoting.
approach same both single- , multi-line strings; example:
from=$'cost\(*):\n$3.' # sample input containing metachars. to='you owe me $1/$& for'$'\n''eating a\1 sauce.' # sample replacement string w/ metachars. # should print unmodified value of $to. # note replacement value needs no escaping. perl -s -0777 -pe 's/\q$from\e/$to/' -- -from="$from" -to="$to" <<<"$from" note use of
-0777read input @ once, multi-line substitution works.the
-soption allows placing-<var>=<val>-style perl variable definitions following--after script, before filename operands.
Comments
Post a Comment