python - choosing reads with Hamming distance zero -


i have fastq files, reads.fastq. have list of 7-mer strings. each read in reads.fastq, want check if contains @ least 1 of 7-mer strings in list. condition that, if match found (hamming distance ==0) read written array chosen_reads , next read fastq file matched. if match not found loop continues till match found. output array consists of unique reads, since matching loop terminates once first match found. wrote following code reads in output array not unique since matches hamming distance 0 reported. please suggest edits:

def hamming(s1, s2):     #return hamming distance between equal-length sequences     if len(s1) != len(s2):         raise valueerror("undefined sequences of unequal length")      return sum(ch1 != ch2 ch1, ch2 in zip(s1, s2))  x in bio.seqio.parse("reads.fastq","fastq"):         reads_array.append(x)  nmer = 7 l_chosen = ['gttattt','attattt','tgctagt']  chosen_reads = [] x in reads_array:     s2 = str(x.seq)     s in [s2[i:i+nmer] in range(len(s2)-nmer-1)]:         ds in l_chosen:                 dist = hamming(ds,s)             if dist == 0:                 print s2, s,ds,dist                        chosen_reads.append(x) 

you current code not break out of loop read next read reads.fastq when has found string hamming distance 0 , should use flags decide when break out , , assign flag true value when need break out -

def hamming(s1, s2):     #return hamming distance between equal-length sequences     if len(s1) != len(s2):         raise valueerror("undefined sequences of unequal length")     return sum(ch1 != ch2 ch1, ch2 in zip(s1, s2))  x in bio.seqio.parse("reads.fastq","fastq"):         reads_array.append(x)  nmer = 7  l_chosen = ['gttattt','attattt','tgctagt'] chosen_reads = []  x in reads_array:         s2 = str(x.seq)         breakflag = false         s in [s2[i:i+nmer] in range(len(s2)-nmer-1)]:                 ds in l_chosen:                         dist = hamming(ds,s)                         if dist == 0:                                 print s2, s,ds,dist                                 chosen_reads.append(x)                                 breakflag = true                                 break;                 if breakflag:                         break; 

and sure want appending x chosen_reads , seems wrong be, unique matches maybe should appending s2 string , matched ds instead right? if want , can append tuple chosen_reads below instead of current appending logic -

chosen_reads.append((ds, s2)) 

Comments

Popular posts from this blog

python - How to create jsonb index using GIN on SQLAlchemy? -

PHP DOM loadHTML() method unusual warning -

c# - TransactionScope not rolling back although no complete() is called -