python - choosing reads with Hamming distance zero -
i have fastq files, reads.fastq
. have list of 7-mer
strings. each read in reads.fastq
, want check if contains @ least 1 of 7-mer
strings in list. condition that, if match found (hamming distance ==0
) read written array chosen_reads
, next read fastq file matched. if match not found loop continues till match found. output array consists of unique reads, since matching loop terminates once first match found. wrote following code reads in output array not unique since matches hamming distance 0 reported. please suggest edits:
def hamming(s1, s2): #return hamming distance between equal-length sequences if len(s1) != len(s2): raise valueerror("undefined sequences of unequal length") return sum(ch1 != ch2 ch1, ch2 in zip(s1, s2)) x in bio.seqio.parse("reads.fastq","fastq"): reads_array.append(x) nmer = 7 l_chosen = ['gttattt','attattt','tgctagt'] chosen_reads = [] x in reads_array: s2 = str(x.seq) s in [s2[i:i+nmer] in range(len(s2)-nmer-1)]: ds in l_chosen: dist = hamming(ds,s) if dist == 0: print s2, s,ds,dist chosen_reads.append(x)
you current code not break out of loop read next read
reads.fastq
when has found string hamming distance 0 , should use flags decide when break out , , assign flag true value when need break out -
def hamming(s1, s2): #return hamming distance between equal-length sequences if len(s1) != len(s2): raise valueerror("undefined sequences of unequal length") return sum(ch1 != ch2 ch1, ch2 in zip(s1, s2)) x in bio.seqio.parse("reads.fastq","fastq"): reads_array.append(x) nmer = 7 l_chosen = ['gttattt','attattt','tgctagt'] chosen_reads = [] x in reads_array: s2 = str(x.seq) breakflag = false s in [s2[i:i+nmer] in range(len(s2)-nmer-1)]: ds in l_chosen: dist = hamming(ds,s) if dist == 0: print s2, s,ds,dist chosen_reads.append(x) breakflag = true break; if breakflag: break;
and sure want appending x
chosen_reads
, seems wrong be, unique matches maybe should appending s2
string , matched ds
instead right? if want , can append tuple chosen_reads
below instead of current appending logic -
chosen_reads.append((ds, s2))
Comments
Post a Comment