bash - Grep string between two html comments in pages -

i have report on how many times css class appears in content of our pages (over 10k pages). trouble is, header , footer contains class, grep returns every single page. (not useful)

so, how grep content?

i have <!-- main content --> , <!-- end content --> comment on every page.

so how grep (do grep?) between comments?

this hosted on linux server, , have access grep, awk , sed.

ideally, report (.txt or .csv) pages , line numbers class shows up, list of pages suffice.


the following script performs requested: print files , line numbers css class name occurs:

#!/bin/sh pattern="class=\"([a-za-z0-9_-]* )*$1( [a-za-z0-9_-]*)*\""  awk -v pat="$pattern" '    /<!-- main content -->/ {y=1}    /<!-- end content -->/ {y=0}    y && $0 ~ pat {f[filename] = f[filename]" "fnr;}    end {for (k in f) printf "%s\tlines:%s\n", k,f[k];} ' *.html 

save use this: 'my_class' 

where my_class class name want search for.


2.html  lines: 7 9 1.html  lines: 5 

some explanation:

  • pattern="class=\"([a-za-z0-9_-]* )*$1( [a-za-z0-9_-]*)*\"" : search class="my_class" or class="others my_class" or class="my_class others"
  • /<!-- main content -->/ {y=1} : when string found, set flag y true, /<!-- end content -->/ {y=0} : set flag y false
  • y && $0 ~ pat {f[filename] = f[filename]" "fnr;} : if flag y true , match class found in current line ($0), save line number associative array f key filename.
  • end {for (k in f) printf "%s\tlines:%s\n", k,f[k];} : after reading files print results in nice format
  • *.html : operate on html files found in current directory


Popular posts from this blog

PHP DOM loadHTML() method unusual warning -

python - How to create jsonb index using GIN on SQLAlchemy? -

c# - TransactionScope not rolling back although no complete() is called -