bash - Grep string between two html comments in pages -
i have report on how many times css class appears in content of our pages (over 10k pages). trouble is, header , footer contains class, grep returns every single page. (not useful)
so, how grep content?
i have <!-- main content -->
, <!-- end content -->
comment on every page.
so how grep (do grep?) between comments?
this hosted on linux server, , have access grep, awk , sed.
ideally, report (.txt or .csv) pages , line numbers class shows up, list of pages suffice.
thanks!
the following script performs requested: print files , line numbers css class name occurs:
#!/bin/sh pattern="class=\"([a-za-z0-9_-]* )*$1( [a-za-z0-9_-]*)*\"" awk -v pat="$pattern" ' /<!-- main content -->/ {y=1} /<!-- end content -->/ {y=0} y && $0 ~ pat {f[filename] = f[filename]" "fnr;} end {for (k in f) printf "%s\tlines:%s\n", k,f[k];} ' *.html
save class_find.sh
use this:
class_find.sh 'my_class'
where my_class
class name want search for.
output:
2.html lines: 7 9 1.html lines: 5
some explanation:
pattern="class=\"([a-za-z0-9_-]* )*$1( [a-za-z0-9_-]*)*\""
: searchclass="my_class"
orclass="others my_class"
orclass="my_class others"
/<!-- main content -->/ {y=1}
: when string found, set flagy
true,/<!-- end content -->/ {y=0}
: set flagy
falsey && $0 ~ pat {f[filename] = f[filename]" "fnr;}
: if flagy
true , match class found in current line ($0
), save line number associative arrayf
key filename.end {for (k in f) printf "%s\tlines:%s\n", k,f[k];}
: after reading files print results in nice format*.html
: operate on html files found in current directory
Comments
Post a Comment