Pyspark Array Key,Value -


i have rdd array stores key-value pair key 2d indices of array , value number @ spot. example [((0,0),1),((0,1),2),((1,0),3),((1,1),4)] want add values of each key surrounding values. in relation earlier example, want add 1,2,3 , place in (0,0) key value spot. how this?

i suggest following:

  1. define function that, given pair (i,j), returns list pairs corresponding positions surrounding (i,j), plus input pair (i,j). instance, lets function called surrounding_pairs(pair). then:

    surrounding_pairs((0,0)) = [ (0,0), (0,1), (1,0) ] surrounding_pairs((2,3)) = [ (2,3), (2,2), (2,4), (1,3), (3,3) ] 

    of course, need careful , return valid positions.

  2. use flatmap on rdd follows:

    myrdd = myrdd.flatmap(lambda (pos, v): [(p, v) p in surrounding_pairs(pos)]) 

    this map rdd [((0,0),1),((0,1),2),((1,0),3),((1,1),4)]

    [((0,0),1),((0,1),1),((1,0),1),  ((0,1),2),((0,0),2),((1,1),2),  ((1,0),3),((0,0),3),((1,1),3),  ((1,1),4),((1,0),4),((0,1),4)] 

    this way, value @ each position "copied" neighbour positions.

  3. finally, use reducebykey add corresponding values @ each position:

    from operator import add myrdd = myrdd.reducebykey(add) 

i hope makes sense.


Comments

Popular posts from this blog

PHP DOM loadHTML() method unusual warning -

python - How to create jsonb index using GIN on SQLAlchemy? -

c# - TransactionScope not rolling back although no complete() is called -