bash - Grep string between two html comments in pages -
i have report on how many times css class appears in content of our pages (over 10k pages). trouble is, header , footer contains class, grep returns every single page. (not useful)
so, how grep content?
i have <!-- main content --> , <!-- end content --> comment on every page.
so how grep (do grep?) between comments?
this hosted on linux server, , have access grep, awk , sed.
ideally, report (.txt or .csv) pages , line numbers class shows up, list of pages suffice.
thanks!
the following script performs requested: print files , line numbers css class name occurs:
#!/bin/sh pattern="class=\"([a-za-z0-9_-]* )*$1( [a-za-z0-9_-]*)*\"" awk -v pat="$pattern" ' /<!-- main content -->/ {y=1} /<!-- end content -->/ {y=0} y && $0 ~ pat {f[filename] = f[filename]" "fnr;} end {for (k in f) printf "%s\tlines:%s\n", k,f[k];} ' *.html save class_find.sh use this:
class_find.sh 'my_class' where my_class class name want search for.
output:
2.html lines: 7 9 1.html lines: 5 some explanation:
pattern="class=\"([a-za-z0-9_-]* )*$1( [a-za-z0-9_-]*)*\"": searchclass="my_class"orclass="others my_class"orclass="my_class others"/<!-- main content -->/ {y=1}: when string found, set flagytrue,/<!-- end content -->/ {y=0}: set flagyfalsey && $0 ~ pat {f[filename] = f[filename]" "fnr;}: if flagytrue , match class found in current line ($0), save line number associative arrayfkey filename.end {for (k in f) printf "%s\tlines:%s\n", k,f[k];}: after reading files print results in nice format*.html: operate on html files found in current directory
Comments
Post a Comment