Is it possible to escape regex metacharacters reliably with sed -
i'm wondering whether possible write 100% reliable sed
command escape regex metacharacters in input string can used in subsequent sed command. this:
#!/bin/bash # trying replace 1 regex in input file sed search="/abc\n\t[a-z]\+\([^ ]\)\{2,3\}\3" replace="/xyz\n\t[0-9]\+\([^ ]\)\{2,3\}\3" # sanitize input search=$(sed 'script escape' <<< "$search") replace=$(sed 'script escape' <<< "$replace") # use in sed command sed "s/$search/$replace/" input
i know there better tools work fixed strings instead of patterns, example awk
, perl
or python
. prove whether possible or not sed
. let's concentrate on basic posix regexes have more fun! :)
i have tried lot of things anytime find input broke attempt. thought keeping abstract script escape
not lead wrong direction.
btw, discussion came here. thought place collect solutions , break and/or elaborate them.
note:
- if you're looking prepackaged functionality based on techniques discussed in answer:
bash
functions enable robust escaping in multi-line substitutions can found @ bottom of post (plusperl
solution usesperl
's built-in support such escaping).- @edmorton's answer contains tool (
bash
script) robustly performs single-line substitutions.
- all snippets assume
bash
shell (posix-compliant reformulations possible):
single-line solutions
escaping string literal use regex in sed
:
to give credit credit due: found regex used below in this answer.
assuming search string single-line string:
search='abc\n\t[a-z]\+\([^ ]\)\{2,3\}\3' # sample input containing metachars. searchescaped=$(sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"$search") # escape it. sed -n "s/$searchescaped/foo/p" <<<"$search" # if ok, echoes 'foo'
- every character except
^
placed in own character set[...]
expression treat literal.- note
^
1 char. cannot represent[^]
, because has special meaning in location (negation).
- note
- then,
^
chars. escaped\^
.
the approach robust, not efficient.
the robustness comes not trying anticipate special regex characters - vary across regex dialects - focus on 2 features shared regex dialects:
- the ability specify literal characters inside character set.
- the ability escape literal
^
\^
escaping string literal use replacement string in sed
's s///
command:
the replacement string in sed
s///
command not regex, recognizes placeholders refer either entire string matched regex (&
) or specific capture-group results index (\1
, \2
, ...), these must escaped, along (customary) regex delimiter, /
.
assuming replacement string single-line string:
replace='laurel & hardy; ps\2' # sample input containing metachars. replaceescaped=$(sed 's/[&/\]/\\&/g' <<<"$replace") # escape sed -n "s/\(.*\) \(.*\)/$replaceescaped/p" <<<"foo bar" # if ok, outputs $replace
multi-line solutions
escaping multi-line string literal use regex in sed
:
note: makes sense if multiple input lines (possibly all) have been read before attempting match.
since tools such sed
, awk
operate on single line @ time default, steps needed make them read more 1 line @ time.
# define sample multi-line literal. search='/abc\n\t[a-z]\+\([^ ]\)\{2,3\}\3 /def\n\t[a-z]\+\([^ ]\)\{3,4\}\4' # escape it. searchescaped=$(sed -e 's/[^^]/[&]/g; s/\^/\\^/g; $!a\'$'\n''\\n' <<<"$search" | tr -d '\n') #' # use in sed command reads input lines front. # if ok, echoes 'foo' sed -n -e ':a' -e '$!{n;ba' -e '}' -e "s/$searchescaped/foo/p" <<<"$search"
- the newlines in multi-line input strings must translated
'\n'
strings, how newlines encoded in regex. $!a\'$'\n''\\n'
appends string'\n'
every output line last (the last newline ignored, because added<<<
)tr -d '\n
removes actual newlines string (sed
adds 1 whenever prints pattern space), replacing newlines in input'\n'
strings.
-e ':a' -e '$!{n;ba' -e '}'
posix-compliant form ofsed
idiom reads all input lines loop, therefore leaving subsequent commands operate on input lines @ once.
escaping multi-line string literal use replacement string in sed
's s///
command:
# define sample multi-line literal. replace='laurel & hardy; ps\2 masters\1 & johnson\2' # escape use sed replacement string. ifs= read -d '' -r < <(sed -e ':a' -e '$!{n;ba' -e '}' -e 's/[&/\]/\\&/g; s/\n/\\&/g' <<<"$replace") replaceescaped=${reply%$'\n'} # if ok, outputs $replace is. sed -n "s/\(.*\) \(.*\)/$replaceescaped/p" <<<"foo bar"
- newlines in input string must retained actual newlines,
\
-escaped. -e ':a' -e '$!{n;ba' -e '}'
posix-compliant form ofsed
idiom reads all input lines loop.'s/[&/\]/\\&/g
escapes&
,\
,/
instances, in single-line solution.s/\n/\\&/g'
\
-prefixes actual newlines.ifs= read -d '' -r
used readsed
command's output as is (to avoid automatic removal of trailing newlines command substitution ($(...)
) perform).${reply%$'\n'}
removes single trailing newline,<<<
has implicitly appended input.
bash
functions based on above (for sed
):
quotere()
quotes (escapes) use in regexquotesubst()
quotes use in substitution string ofs///
call.- both handle multi-line input correctly
- note because
sed
reads single line @ at time default, use ofquotere()
multi-line strings makes sense insed
commands explicitly read multiple (or all) lines @ once. - also, using command substitutions (
$(...)
) call functions won't work strings have trailing newlines; in event, useifs= read -d '' -r escapedvalue <(quotesubst "$value")
- note because
# synopsis # quotere <text> quotere() { sed -e 's/[^^]/[&]/g; s/\^/\\^/g; $!a\'$'\n''\\n' <<<"$1" | tr -d '\n'; }
# synopsis # quotesubst <text> quotesubst() { ifs= read -d '' -r < <(sed -e ':a' -e '$!{n;ba' -e '}' -e 's/[&/\]/\\&/g; s/\n/\\&/g' <<<"$1") printf %s "${reply%$'\n'}" }
example:
from=$'cost\(*):\n$3.' # sample input containing metachars. to='you & i'$'\n''eating a\1 sauce.' # sample replacement string metachars. # should print unmodified value of $to sed -e ':a' -e '$!{n;ba' -e '}' -e "s/$(quotere "$from")/$(quotesubst "$to")/" <<<"$from"
note use of -e ':a' -e '$!{n;ba' -e '}'
read input @ once, multi-line substitution works.
perl
solution:
perl has built-in support escaping arbitrary strings literal use in regex: quotemeta()
function or equivalent \q...\e
quoting.
approach same both single- , multi-line strings; example:
from=$'cost\(*):\n$3.' # sample input containing metachars. to='you owe me $1/$& for'$'\n''eating a\1 sauce.' # sample replacement string w/ metachars. # should print unmodified value of $to. # note replacement value needs no escaping. perl -s -0777 -pe 's/\q$from\e/$to/' -- -from="$from" -to="$to" <<<"$from"
note use of
-0777
read input @ once, multi-line substitution works.the
-s
option allows placing-<var>=<val>
-style perl variable definitions following--
after script, before filename operands.
Comments
Post a Comment