python - Regex not working for multiple pattern occurence -


i want grab every first occurence of strings followed "genome_" ending before ",(" , replace particular string, "xxx"

in text below:

(id_bxylanisolvens_nlae-zl-c182_genome_orf00003____bxylanisolvens_nlae-.._843_unknown___1278-2120_1_^^neighbours_id_bxylanisolvens_nlae-zl-c182_genome_orf00002_1__id_bxylanisolvens_nlae-zl-c182_genome_orf00004_1__neighbour_genes_bxylanisolvens_nlae-.._bxylanisolvens_nlae-..:0.00000230914009336068,((id_bxylanisolvens_nlae-zl-g421_genome_orf00003____bxylanisolvens_nlae-.._843_unknown___1315-2157_1_^^neighbours_id_bxylanisolvens_nlae-zl-g421_genome_orf00002_1__id_bxylanisolvens_nlae-zl-g421_genome_orf00004_1__neighbour_genes_bxylanisolvens_nlae-.._bxylanisolvens_nlae-..:0.00000230914009336068,id_bxylanisolvens_nlae-zl-c339_genome_orf00003____bxylanisolvens_nlae-.._843_unknown___1084-1926_1_^^neighbours_id_bxylanisolvens_nlae-zl-c339_genome_orf00002_1__id_bxylanisolvens_nlae-zl-c339_genome_orf00004_1__neighbour_genes_bxylanisolvens_nlae-.._bxylanisolvens_nlae-..:0.00000230914009336068)28:0.00000230914009336068,(

desired result:

(id_bxylanisolvens_nlae-zl-c182_xxx,((id_bxylanisolvens_nlae-zl-g421_xxx,(

based on sample data , desired output, positive look-around should help:

(?<=id_bxylanisolvens_nlae-zl-[a-z]\d{3,3}_)(genome.*?)(?=,\() 
  • (?<=id_bxylanisolvens_nlae-zl-[a-z]\d{3,3}_) looks , checks particular sequence of characters. might need adjustment depending on actual data's variability.
  • (genome.*?) catches bit replace - question mark making non-greedy.
  • (?=,\() looking forward character combination delimit dropped portion.

see in action: regex101.
please comment if , further detail / adjustment required.


Comments

Popular posts from this blog

PHP DOM loadHTML() method unusual warning -

python - How to create jsonb index using GIN on SQLAlchemy? -

c# - TransactionScope not rolling back although no complete() is called -