python - Regex not working for multiple pattern occurence -
i want grab every first occurence of strings followed "genome_"
ending before ",("
, replace particular string, "xxx"
in text below:
(id_bxylanisolvens_nlae-zl-c182_genome_orf00003____bxylanisolvens_nlae-.._843_unknown___1278-2120_1_^^neighbours_id_bxylanisolvens_nlae-zl-c182_genome_orf00002_1__id_bxylanisolvens_nlae-zl-c182_genome_orf00004_1__neighbour_genes_bxylanisolvens_nlae-.._bxylanisolvens_nlae-..:0.00000230914009336068,((id_bxylanisolvens_nlae-zl-g421_genome_orf00003____bxylanisolvens_nlae-.._843_unknown___1315-2157_1_^^neighbours_id_bxylanisolvens_nlae-zl-g421_genome_orf00002_1__id_bxylanisolvens_nlae-zl-g421_genome_orf00004_1__neighbour_genes_bxylanisolvens_nlae-.._bxylanisolvens_nlae-..:0.00000230914009336068,id_bxylanisolvens_nlae-zl-c339_genome_orf00003____bxylanisolvens_nlae-.._843_unknown___1084-1926_1_^^neighbours_id_bxylanisolvens_nlae-zl-c339_genome_orf00002_1__id_bxylanisolvens_nlae-zl-c339_genome_orf00004_1__neighbour_genes_bxylanisolvens_nlae-.._bxylanisolvens_nlae-..:0.00000230914009336068)28:0.00000230914009336068,(
desired result:
(id_bxylanisolvens_nlae-zl-c182_xxx,((id_bxylanisolvens_nlae-zl-g421_xxx,(
based on sample data , desired output, positive look-around should help:
(?<=id_bxylanisolvens_nlae-zl-[a-z]\d{3,3}_)(genome.*?)(?=,\()
(?<=id_bxylanisolvens_nlae-zl-[a-z]\d{3,3}_)
looks , checks particular sequence of characters. might need adjustment depending on actual data's variability.(genome.*?)
catches bit replace - question mark making non-greedy.(?=,\()
looking forward character combination delimit dropped portion.
see in action: regex101.
please comment if , further detail / adjustment required.
Comments
Post a Comment