Pyspark Array Key,Value -
i have rdd array stores key-value pair key 2d indices of array , value number @ spot. example [((0,0),1),((0,1),2),((1,0),3),((1,1),4)] want add values of each key surrounding values. in relation earlier example, want add 1,2,3 , place in (0,0) key value spot. how this?
i suggest following:
define function that, given pair (i,j), returns list pairs corresponding positions surrounding (i,j), plus input pair (i,j). instance, lets function called
surrounding_pairs(pair)
. then:surrounding_pairs((0,0)) = [ (0,0), (0,1), (1,0) ] surrounding_pairs((2,3)) = [ (2,3), (2,2), (2,4), (1,3), (3,3) ]
of course, need careful , return valid positions.
use
flatmap
on rdd follows:myrdd = myrdd.flatmap(lambda (pos, v): [(p, v) p in surrounding_pairs(pos)])
this map rdd
[((0,0),1),((0,1),2),((1,0),3),((1,1),4)]
[((0,0),1),((0,1),1),((1,0),1), ((0,1),2),((0,0),2),((1,1),2), ((1,0),3),((0,0),3),((1,1),3), ((1,1),4),((1,0),4),((0,1),4)]
this way, value @ each position "copied" neighbour positions.
finally, use
reducebykey
add corresponding values @ each position:from operator import add myrdd = myrdd.reducebykey(add)
i hope makes sense.
Comments
Post a Comment